Skip to main content

Showing 1–50 of 1,469 results for author: Zhang, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.07526  [pdf, ps, other

    cs.SD eess.AS

    DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction

    Authors: Cunhang Fan, Sheng Zhang, Jingjing Zhang, Enrui Liu, Xinhui Li, Minggang Zhao, Zhao Lv

    Abstract: Decoding speech from brain signals is a challenging research problem. Although existing technologies have made progress in reconstructing the mel spectrograms of auditory stimuli at the word or letter level, there remain core challenges in the precise reconstruction of minute-level continuous imagined speech: traditional models struggle to balance the efficiency of temporal dependency modeling and… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM MM 2025

  2. arXiv:2507.06971  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting

    Authors: Fei Teng, Kai Luo, Sheng Wu, Siyu Li, Pujun Guo, Jiale Wei, Kunyu Peng, Jiaming Zhang, Kailun Yang

    Abstract: Panoramic perception holds significant potential for autonomous driving, enabling vehicles to acquire a comprehensive 360° surround view in a single shot. However, autonomous driving is a data-driven task. Complete panoramic data acquisition requires complex sampling systems and annotation pipelines, which are time-consuming and labor-intensive. Although existing street view generation models have… ▽ More

    Submitted 9 July, 2025; v1 submitted 9 July, 2025; originally announced July 2025.

    Comments: The source code will be publicly available at https://github.com/Bryant-Teng/Percep360

  3. arXiv:2507.05656  [pdf, ps, other

    eess.IV cs.CV cs.LG q-bio.QM

    ADPv2: A Hierarchical Histological Tissue Type-Annotated Dataset for Potential Biomarker Discovery of Colorectal Disease

    Authors: Zhiyuan Yang, Kai Li, Sophia Ghamoshi Ramandi, Patricia Brassard, Hakim Khellaf, Vincent Quoc-Huy Trinh, Jennifer Zhang, Lina Chen, Corwyn Rowsell, Sonal Varma, Kostas Plataniotis, Mahdi S. Hosseini

    Abstract: Computational pathology (CoPath) leverages histopathology images to enhance diagnostic precision and reproducibility in clinical pathology. However, publicly available datasets for CoPath that are annotated with extensive histological tissue type (HTT) taxonomies at a granular level remain scarce due to the significant expertise and high annotation costs required. Existing datasets, such as the At… ▽ More

    Submitted 9 July, 2025; v1 submitted 8 July, 2025; originally announced July 2025.

    ACM Class: I.2.10; I.2.1

  4. arXiv:2507.05451  [pdf

    eess.IV cs.CV eess.SP

    Self-supervised Deep Learning for Denoising in Ultrasound Microvascular Imaging

    Authors: Lijie Huang, Jingyi Yin, Jingke Zhang, U-Wai Lok, Ryan M. DeRuiter, Jieyang Jin, Kate M. Knoll, Kendra E. Petersen, James D. Krier, Xiang-yang Zhu, Gina K. Hesley, Kathryn A. Robinson, Andrew J. Bentall, Thomas D. Atwell, Andrew D. Rule, Lilach O. Lerman, Shigao Chen, Chengwu Huang

    Abstract: Ultrasound microvascular imaging (UMI) is often hindered by low signal-to-noise ratio (SNR), especially in contrast-free or deep tissue scenarios, which impairs subsequent vascular quantification and reliable disease diagnosis. To address this challenge, we propose Half-Angle-to-Half-Angle (HA2HA), a self-supervised denoising framework specifically designed for UMI. HA2HA constructs training pairs… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 12 pages, 10 figures. Supplementary materials are available at https://zenodo.org/records/15832003

  5. arXiv:2507.05177  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

    Authors: Chen Wang, Tianyu Peng, Wen Yang, Yinan Bai, Guangfu Wang, Jun Lin, Lanpeng Jia, Lingxiang Wu, Jinqiao Wang, Chengqing Zong, Jiajun Zhang

    Abstract: Empathetic interaction is a cornerstone of human-machine communication, due to the need for understanding speech enriched with paralinguistic cues and generating emotional and expressive responses. However, the most powerful empathetic LSLMs are increasingly closed off, leaving the crucial details about the architecture, data and development opaque to researchers. Given the critical need for trans… ▽ More

    Submitted 8 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: Technical Report

  6. arXiv:2507.04821  [pdf, ps, other

    eess.SY

    Force-IMU Fusion-Based Sensing Acupuncture Needle and Quantitative Analysis System for Acupuncture Manipulations

    Authors: Peng Tian, Kang Yu, Tianyun Jiang, Yuqi Wang, Haiying Zhang, Hao Yang, Yunfeng Wang, Jun Zhang, Shuo Gao, Junhong Gao

    Abstract: Acupuncture, one of the key therapeutic methods in Traditional Chinese Medicine (TCM), has been widely adopted in various clinical fields. Quantitative research on acupuncture manipulation parameters is critical to achieve standardized techniques. However, quantitative mechanical detection of acupuncture parameters remains limited. This study establishes a kinematic and dynamic model of acupunctur… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  7. arXiv:2507.04284  [pdf, ps, other

    eess.SP cs.IT

    High-Availability Integrity Monitoring for Multi-Constellation GNSS Navigation with Non-Gaussian Errors

    Authors: Penggao Yan, Ronghe Jin, Junyi Zhang, Cheng-Wei Wang, Li-Ta Hsu

    Abstract: Global navigation satellite systems (GNSS) are essential for aviation, requiring strict integrity monitoring to alert users to hazardously misleading information. Conventional receiver autonomous integrity monitoring (RAIM) and advanced RAIM (ARAIM) rely heavily on Gaussian models in bounding nominal errors, which can be overly conservative with real-world non-Gaussian errors with heavy tails, suc… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Comments: Submitted to IEEE Transactions on Instrumentation and Measurement

  8. arXiv:2507.03315  [pdf, ps, other

    eess.IV cs.CV

    Towards Interpretable PolSAR Image Classification: Polarimetric Scattering Mechanism Informed Concept Bottleneck and Kolmogorov-Arnold Network

    Authors: Jinqi Zhang, Fangzhou Han, Di Zhuang, Lamei Zhang, Bin Zou, Li Yuan

    Abstract: In recent years, Deep Learning (DL) based methods have received extensive and sufficient attention in the field of PolSAR image classification, which show excellent performance. However, due to the ``black-box" nature of DL methods, the interpretation of the high-dimensional features extracted and the backtracking of the decision-making process based on the features are still unresolved problems.… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  9. arXiv:2507.02437  [pdf, ps, other

    cs.CV eess.IV

    F^2TTA: Free-Form Test-Time Adaptation on Cross-Domain Medical Image Classification via Image-Level Disentangled Prompt Tuning

    Authors: Wei Li, Jingyang Zhang, Lihao Liu, Guoan Wang, Junjun He, Yang Chen, Lixu Gu

    Abstract: Test-Time Adaptation (TTA) has emerged as a promising solution for adapting a source model to unseen medical sites using unlabeled test data, due to the high cost of data annotation. Existing TTA methods consider scenarios where data from one or multiple domains arrives in complete domain units. However, in clinical practice, data usually arrives in domain fragments of arbitrary lengths and in ran… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: This paper has been submitted to relevant journals

  10. arXiv:2507.01876  [pdf, ps, other

    cs.IT eess.SP

    Joint Power Control and Precoding for Cell-Free Massive MIMO Systems With Sparse Multi-Dimensional Graph Neural Networks

    Authors: Yukun Ma, Jiayi Zhang, Ziheng Liu, Guowei Shi, Bo Ai

    Abstract: Cell-free massive multiple-input multiple-output (CF mMIMO) has emerged as a prominent candidate for future networks due to its ability to significantly enhance spectral efficiency by eliminating inter-cell interference. However, its practical deployment faces considerable challenges, such as high computational complexity and the optimization of its complex processing. To address these challenges,… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 5 pages, 5 figures

  11. arXiv:2507.01323  [pdf, ps, other

    eess.IV cs.CV cs.LG

    SWinMamba: Serpentine Window State Space Model for Vascular Segmentation

    Authors: Rongchang Zhao, Huanchi Liu, Jian Zhang

    Abstract: Vascular segmentation in medical images is crucial for disease diagnosis and surgical navigation. However, the segmented vascular structure is often discontinuous due to its slender nature and inadequate prior modeling. In this paper, we propose a novel Serpentine Window Mamba (SWinMamba) to achieve accurate vascular segmentation. The proposed SWinMamba innovatively models the continuity of slende… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  12. arXiv:2507.01055  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Prompt Mechanisms in Medical Imaging: A Comprehensive Survey

    Authors: Hao Yang, Xinlong Liang, Zhang Li, Yue Sun, Zheyu Hu, Xinghe Xie, Behdad Dashtbozorg, Jincheng Huang, Shiwei Zhu, Luyi Han, Jiong Zhang, Shanshan Wang, Ritse Mann, Qifeng Yu, Tao Tan

    Abstract: Deep learning offers transformative potential in medical imaging, yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization. Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models, providing flexible, domain-specific adaptations that significantly enhance model performa… ▽ More

    Submitted 27 June, 2025; originally announced July 2025.

  13. arXiv:2507.00660  [pdf, ps, other

    eess.IV cs.AI cs.CV

    MTCNet: Motion and Topology Consistency Guided Learning for Mitral Valve Segmentationin 4D Ultrasound

    Authors: Rusi Chen, Yuanting Yang, Jiezhi Yao, Hongning Song, Ji Zhang, Yongsong Zhou, Yuhao Huang, Ronghao Yang, Dan Jia, Yuhan Zhang, Xing Tao, Haoran Dou, Qing Zhou, Xin Yang, Dong Ni

    Abstract: Mitral regurgitation is one of the most prevalent cardiac disorders. Four-dimensional (4D) ultrasound has emerged as the primary imaging modality for assessing dynamic valvular morphology. However, 4D mitral valve (MV) analysis remains challenging due to limited phase annotations, severe motion artifacts, and poor imaging quality. Yet, the absence of inter-phase dependency in existing methods hind… ▽ More

    Submitted 3 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI 2025

  14. arXiv:2507.00452  [pdf

    eess.SY

    The impact of the following vehicles behaviors on the car following behaviors of the ego-vehicle

    Authors: Yang Liu, Jiahao Zhang, Yuxuan Ouyang, Huan Yu, Dengbo He

    Abstract: Among all types of crashes, rear-end crashes dominate, which are closely related to the car-following (CF) behaviors. Traditional CF behavior models focused on the influence of the vehicle in front, but usually ignored the peer pressure from the surrounding road users, including the following vehicle (FV). Based on an open dataset, the highD dataset, we investigated whether the FV's states can aff… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  15. arXiv:2506.23495  [pdf, ps, other

    eess.SP

    Far-Field vs. Near-Field Propagation Channels: Key Differences and Impact on 6G XL-MIMO Performance Evaluation

    Authors: Zihang Ding, Jianhua Zhang, Changsheng You, Pan Tang, Hongbo Xing, Zhiqiang Yuan, Jie Meng, Guangyi Liu

    Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is regarded as a promising technology for next-generation communication systems. However, this will expand the near-field (NF) range, rendering more users more likely to be located in the NF region. In this paper, we aim to answer two questions: What are the new characteristics of the NF channel? Is it necessary to develop new transciv… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

    Comments: 13 pages, 8 figures, 2 tables, 52 references. Note: This article has been submitted to China Communications and is currently under review

  16. arXiv:2506.22467  [pdf

    eess.SP cs.CV

    SegmentAnyMuscle: A universal muscle segmentation model across different locations in MRI

    Authors: Roy Colglazier, Jisoo Lee, Haoyu Dong, Hanxue Gu, Yaqian Chen, Joseph Cao, Zafer Yildiz, Zhonghao Liu, Nicholas Konz, Jichen Yang, Jikai Zhang, Yuwen Chen, Lin Li, Adrian Camarena, Maciej A. Mazurowski

    Abstract: The quantity and quality of muscles are increasingly recognized as important predictors of health outcomes. While MRI offers a valuable modality for such assessments, obtaining precise quantitative measurements of musculature remains challenging. This study aimed to develop a publicly available model for muscle segmentation in MRIs and demonstrate its applicability across various anatomical locati… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 24 pages, 6 figures

  17. arXiv:2506.22012  [pdf, ps, other

    eess.IV cs.CV

    Noise-Inspired Diffusion Model for Generalizable Low-Dose CT Reconstruction

    Authors: Qi Gao, Zhihao Chen, Dong Zeng, Junping Zhang, Jianhua Ma, Hongming Shan

    Abstract: The generalization of deep learning-based low-dose computed tomography (CT) reconstruction models to doses unseen in the training data is important and remains challenging. Previous efforts heavily rely on paired data to improve the generalization performance and robustness through collecting either diverse CT data for re-training or a few test data for fine-tuning. Recently, diffusion models have… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted for publication in Medical Image Analysis, 2025

  18. arXiv:2506.21198  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation

    Authors: Yihong Cao, Jiaming Zhang, Xu Zheng, Hao Shi, Kunyu Peng, Hang Liu, Kailun Yang, Hui Zhang

    Abstract: Panoramic image processing is essential for omni-context perception, yet faces constraints like distortions, perspective occlusions, and limited annotations. Previous unsupervised domain adaptation methods transfer knowledge from labeled pinhole data to unlabeled panoramic images, but they require access to source pinhole data. To address these, we introduce a more practical task, i.e., Source-Fre… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted to ICCV 2025. All data and code will be made publicly available at https://github.com/yihong-97/UNLOCK

  19. arXiv:2506.17887  [pdf, ps, other

    eess.SP

    Near-Field Propagation and Spatial Non-Stationarity Channel Model for 6-24 GHz (FR3) Extremely Large-Scale MIMO: Adopted by 3GPP for 6G

    Authors: Huixin Xu, Jianhua Zhang, Pan Tang, Hongbo Xing, Haiyang Miao, Nan Zhang, Jian Li, Jianming Wu, Wenfei Yang, Zhening Zhang, Wei Jiang, Zijian He, Afshin Haghighat, Qixing Wang, Guangyi Liu

    Abstract: Next generation cellular deployments are expected to exploit the 6-24 GHz frequency range 3 (FR3) and extremely large-scale multiple-input multiple-output (XL-MIMO) to enable ultra-high data rates and reliability. However, the significantly enlarged antenna apertures and higher carrier frequencies render the far-field and spatial stationarity assumptions in the existing 3rd generation partnership… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  20. arXiv:2506.15972  [pdf, ps, other

    eess.SP

    Theoretical Analysis of Near-Field MIMO Channel Capacity and Mid-Band Experimental Validation

    Authors: Haiyang Miao, Jianhua Zhang, Pan Tang, Heng Wang, Lei Tian, Guangyi Liu

    Abstract: With the increase of multiple-input-multiple-output (MIMO) array size and carrier frequency, near-field MIMO communications will become crucial in 6G wireless networks. Due to the increase of MIMO near-field range, the research of near-field MIMO capacity has aroused wide interest. In this paper, we focus on the theoretical analysis and empirical study of near-field MIMO capacity. First, the near-… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  21. arXiv:2506.14165  [pdf, ps, other

    eess.SP

    A Comprehensive Survey on Underwater Acoustic Target Positioning and Tracking: Progress, Challenges, and Perspectives

    Authors: Zhong Yang, Zhengqiu Zhu, Yong Zhao, Yonglin Tian, Changjun Fan, Runkang Guo, Wenhao Lu, Jingwei Ge, Bin Chen, Yin Zhang, Guohua Wu, Rui Wang, Gyorgy Eigner, Guangquan Cheng, Jincai Huang, Zhong Liu, Jun Zhang, Imre J. Rudas, Fei-Yue Wang

    Abstract: Underwater target tracking technology plays a pivotal role in marine resource exploration, environmental monitoring, and national defense security. Given that acoustic waves represent an effective medium for long-distance transmission in aquatic environments, underwater acoustic target tracking has become a prominent research area of underwater communications and networking. Existing literature re… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  22. arXiv:2506.13443  [pdf

    eess.IV cs.CV

    PRO: Projection Domain Synthesis for CT Imaging

    Authors: Kang Chen, Bin Huang, Xuebin Yang, Junyan Zhang, Qiegen Liu

    Abstract: Synthesizing high quality CT projection data remains a significant challenge due to the limited availability of annotated data and the complex nature of CT imaging. In this work, we present PRO, a projection domain synthesis foundation model for CT imaging. To the best of our knowledge, this is the first study that performs CT synthesis in the projection domain. Unlike previous approaches that ope… ▽ More

    Submitted 18 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

  23. arXiv:2506.12544  [pdf, ps, other

    eess.SY cs.RO

    Constrained Diffusers for Safe Planning and Control

    Authors: Jichen Zhang, Liqun Zhao, Antonis Papachristodoulou, Jack Umenberger

    Abstract: Diffusion models have shown remarkable potential in planning and control tasks due to their ability to represent multimodal distributions over actions and trajectories. However, ensuring safety under constraints remains a critical challenge for diffusion models. This paper proposes Constrained Diffusers, a novel framework that incorporates constraints into pre-trained diffusion models without retr… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: 12 pages, 5 figures

  24. arXiv:2506.12073  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis

    Authors: Zongli Ye, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Haodong Li, Shuhe Li, Chenxu Guo, Anaisha Das, Peter Park, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Gorno-Tempini, Gopala Anumanchipalli

    Abstract: Accurate alignment of dysfluent speech with intended text is crucial for automating the diagnosis of neurodegenerative speech disorders. Traditional methods often fail to model phoneme similarities effectively, limiting their performance. In this work, we propose Neural LCS, a novel approach for dysfluent text-text and speech-text alignment. Neural LCS addresses key challenges, including partial a… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted for Interspeech2025

  25. arXiv:2506.11514  [pdf, ps, other

    eess.AS cs.SD

    Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders

    Authors: Xingwei Sun, Heinrich Dinkel, Yadong Niu, Linzhang Wang, Junbo Zhang, Jian Luan

    Abstract: Recent research has delved into speech enhancement (SE) approaches that leverage audio embeddings from pre-trained models, diverging from time-frequency masking or signal prediction techniques. This paper introduces an efficient and extensible SE method. Our approach involves initially extracting audio embeddings from noisy speech using a pre-trained audioencoder, which are then denoised by a comp… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  26. arXiv:2506.11350  [pdf, ps, other

    cs.SD cs.CL eess.AS

    GLAP: General contrastive audio-text pretraining across domains and languages

    Authors: Heinrich Dinkel, Zhiyong Yan, Tianzi Wang, Yongqing Wang, Xingwei Sun, Yadong Niu, Jizhong Liu, Gang Li, Junbo Zhang, Jian Luan

    Abstract: Contrastive Language Audio Pretraining (CLAP) is a widely-used method to bridge the gap between audio and text domains. Current CLAP methods enable sound and music retrieval in English, ignoring multilingual spoken content. To address this, we introduce general language audio pretraining (GLAP), which expands CLAP with multilingual and multi-domain abilities. GLAP demonstrates its versatility by a… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  27. arXiv:2506.10813  [pdf, ps, other

    cs.CV eess.IV eess.SP

    Unsupervised Deformable Image Registration with Structural Nonparametric Smoothing

    Authors: Hang Zhang, Xiang Chen, Renjiu Hu, Rongguang Wang, Jinwei Zhang, Min Liu, Yaonan Wang, Gaolei Li, Xinxing Cheng, Jinming Duan

    Abstract: Learning-based deformable image registration (DIR) accelerates alignment by amortizing traditional optimization via neural networks. Label supervision further enhances accuracy, enabling efficient and precise nonlinear alignment of unseen scans. However, images with sparse features amid large smooth regions, such as retinal vessels, introduce aperture and large-displacement challenges that unsuper… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted for publication at Information Processing in Medical Imaging (IPMI) 2025

  28. arXiv:2506.10207  [pdf, ps, other

    cs.SD cs.DC eess.AS

    FedMLAC: Mutual Learning Driven Heterogeneous Federated Audio Classification

    Authors: Jun Bai, Rajib Rana, Di Wu, Youyang Qu, Xiaohui Tao, Ji Zhang

    Abstract: Federated Learning (FL) provides a privacy-preserving paradigm for training audio classification (AC) models across distributed clients without sharing raw data. However, Federated Audio Classification (FedAC) faces three critical challenges that substantially hinder performance: data heterogeneity, model heterogeneity, and data poisoning. While prior works have attempted to address these issues,… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: initial version

  29. Physical Layer-Based Device Fingerprinting for Wireless Security: From Theory to Practice

    Authors: Junqing Zhang, Francesco Ardizzon, Mattia Piana, Guanxiong Shen, Stefano Tomasin

    Abstract: The identification of the devices from which a message is received is part of security mechanisms to ensure authentication in wireless communications. Conventional authentication approaches are cryptography-based, which, however, are usually computationally expensive and not adequate in the Internet of Things (IoT), where devices tend to be low-cost and with limited resources. This paper provides… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  30. arXiv:2506.08404  [pdf, ps, other

    eess.SY

    Compact Amplified Laser Power Stabilization Using Robust Active Disturbance Rejection Control with Sensor Noise Decoupling

    Authors: Yanpei Shi, Jingxuan Zhang, Zhuo Shi, Chenyao Zhang, Yuze Guo, Rui Feng

    Abstract: Laser power instability, encompassing random jitter and slow drift, severely limits the performance of optically pumped magnetometers (OPMs) in detecting ultra-weak magnetic fields, especially in large-scale OPM arrays for magnetoencephalography. Although a unified amplified laser (AL) architecture improves integration, fluctuations in the pump beam progressively degrade performance across all cha… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  31. arXiv:2506.07599  [pdf, ps, other

    cs.IT eess.SP

    Flexible MIMO for Future Wireless Communications: Which Flexibilities are Possible?

    Authors: Zhe Wang, Jiayi Zhang, Bokai Xu, Wenhui Yi, Emil Björnson, Bo Ai

    Abstract: To enable next-generation wireless communication networks with modest spectrum availability, multiple-input multiple-output (MIMO) technology needs to undergo further evolution. In this paper, we introduce a promising next-generation wireless communication concept: flexible MIMO technology. This technology represents a MIMO technology with flexible physical configurations and integrated applicatio… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 9 pages, 5 figures, 1 table

  32. arXiv:2506.06710  [pdf, ps, other

    cs.CV eess.IV

    A Systematic Investigation on Deep Learning-Based Omnidirectional Image and Video Super-Resolution

    Authors: Qianqian Zhao, Chunle Guo, Tianyi Zhang, Junpei Zhang, Peiyang Jia, Tan Su, Wenjie Jiang, Chongyi Li

    Abstract: Omnidirectional image and video super-resolution is a crucial research topic in low-level vision, playing an essential role in virtual reality and augmented reality applications. Its goal is to reconstruct high-resolution images or video frames from low-resolution inputs, thereby enhancing detail preservation and enabling more accurate scene analysis and interpretation. In recent years, numerous i… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  33. arXiv:2506.06360  [pdf

    eess.SP cs.LG

    Towards Generalizable Drowsiness Monitoring with Physiological Sensors: A Preliminary Study

    Authors: Jiyao Wang, Suzan Ayas, Jiahao Zhang, Xiao Wen, Dengbo He, Birsen Donmez

    Abstract: Accurately detecting drowsiness is vital to driving safety. Among all measures, physiological-signal-based drowsiness monitoring can be more privacy-preserving than a camera-based approach. However, conflicts exist regarding how physiological metrics are associated with different drowsiness labels across datasets. Thus, we analyzed key features from electrocardiograms (ECG), electrodermal activity… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted by HFES2025

  34. arXiv:2506.05921  [pdf, ps, other

    eess.SP

    Multi-Modal Large Models Based Beam Prediction: An Example Empowered by DeepSeek

    Authors: Yizhu Zhao, Li Yu, Lianzheng Shi, Jianhua Zhang, Guangyi Liu

    Abstract: Beam prediction is an effective approach to reduce training overhead in massive multiple-input multiple-output (MIMO) systems. However, existing beam prediction models still exhibit limited generalization ability in diverse scenarios, which remains a critical challenge. In this paper, we propose MLM-BP, a beam prediction framework based on the multi-modal large model released by DeepSeek, with ful… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  35. arXiv:2506.04594  [pdf, other

    cs.NI cs.AI eess.SP

    Intelligent Channel Allocation for IEEE 802.11be Multi-Link Operation: When MAB Meets LLM

    Authors: Shumin Lian, Jingwen Tong, Jun Zhang, Liqun Fu

    Abstract: WiFi networks have achieved remarkable success in enabling seamless communication and data exchange worldwide. The IEEE 802.11be standard, known as WiFi 7, introduces Multi-Link Operation (MLO), a groundbreaking feature that enables devices to establish multiple simultaneous connections across different bands and channels. While MLO promises substantial improvements in network throughput and laten… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: This work has been accepted by JSAC 2025

    ACM Class: I.2.7

  36. arXiv:2506.02642  [pdf, ps, other

    cs.IT eess.SP

    Joint Optimization based on Two-phase GNN in RIS- and DF-assisted MISO Systems with Fine-grained Rate Demands

    Authors: Huijun Tang, Jieling Zhang, Zhidong Zhao, Huaming Wu, Hongjian Sun, Pengfei Jiao

    Abstract: Reconfigurable intelligent Surfaces (RIS) and half-duplex decoded and forwarded (DF) relays can collaborate to optimize wireless signal propagation in communication systems. Users typically have different rate demands and are clustered into groups in practice based on their requirements, where the former results in the trade-off between maximizing the rate and satisfying fine-grained rate demands,… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 14 Pages, 9 figures, accepted by IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS

  37. arXiv:2506.00480  [pdf, ps, other

    eess.SP

    The Coupling Effect of Sensing Targets on the Environment for 3GPP ISAC Channels: Observation, Modeling, and Validation

    Authors: Yameng Liu, Jianhua Zhang, Yuxiang Zhang, Hongbo Xing, Yifeng Xiong, Zhiqiang Yuan, Guangyi Liu

    Abstract: Integrated Sensing And Communication (ISAC) has been identified as a key 6G application by ITU and 3GPP, with standardization efforts already underway. Sensing tasks, such as target localization, demand more precise characterization of the sensing target (ST) in ISAC channel modeling. The ST couples complexly with environmental scatterers, potentially blocking some multipaths and generating new on… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  38. arXiv:2506.00466  [pdf, ps, other

    eess.AS cs.SD

    M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction

    Authors: Cunhang Fan, Ying Chen, Jian Zhou, Zexu Pan, Jingjing Zhang, Youdian Gao, Xiaoke Yang, Zhengqi Wen, Zhao Lv

    Abstract: The brain-assisted target speaker extraction (TSE) aims to extract the attended speech from mixed speech by utilizing the brain neural activities, for example Electroencephalography (EEG). However, existing models overlook the issue of temporal misalignment between speech and EEG modalities, which hampers TSE performance. In addition, the speech encoder in current models typically uses basic tempo… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted to IJCAI 2025

  39. arXiv:2505.24576  [pdf, ps, other

    eess.AS

    A Composite Predictive-Generative Approach to Monaural Universal Speech Enhancement

    Authors: Jie Zhang, Haoyin Yan, Xiaofei Li

    Abstract: It is promising to design a single model that can suppress various distortions and improve speech quality, i.e., universal speech enhancement (USE). Compared to supervised learning-based predictive methods, diffusion-based generative models have shown greater potential due to the generative capacities from degraded speech with severely damaged information. However, artifacts may be introduced in h… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing

  40. arXiv:2505.22029  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection

    Authors: Jinming Zhang, Xuanru Zhou, Jiachen Lian, Shuhe Li, William Li, Zoe Ezzes, Rian Bogley, Lisa Wauters, Zachary Miller, Jet Vonk, Brittany Morin, Maria Gorno-Tempini, Gopala Anumanchipalli

    Abstract: Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency generation, existing synthetic datasets suffer from unnatural prosody and limited contextual diversity. To address these limitations, we propose LLM-Dys -- the… ▽ More

    Submitted 22 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  41. arXiv:2505.21384  [pdf

    eess.SP

    Label-free Super-Resolution Microvessel Color Flow Imaging with Ultrasound

    Authors: Zhengchang Kou, Junhang Zhang, Chen Gong, Jie Ji, Nathiya Vaithiyalingam Chandra Sekaran, Zikai Wang, Rita J. Miller, Yaoheng Yang, Daniel Adolfo Llano, Qifa Zhou, Michael L. Oelze

    Abstract: We present phase subtraction imaging (PSI), a new spatial-temporal beamforming method that enables micrometer level resolution imaging of microvessels in live animals without labels, which are microbubbles in ultrasound super-resolution imaging. Subtraction of relative phase differences between consecutive frames beamformed with mismatched apodizations is used in PSI to overcome the diffraction li… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  42. arXiv:2505.20984  [pdf, ps, other

    eess.IV cs.CV

    Generative Image Compression by Estimating Gradients of the Rate-variable Feature Distribution

    Authors: Minghao Han, Weiyi You, Jinhua Zhang, Leheng Zhang, Ce Zhu, Shuhang Gu

    Abstract: While learned image compression (LIC) focuses on efficient data transmission, generative image compression (GIC) extends this framework by integrating generative modeling to produce photo-realistic reconstructed images. In this paper, we propose a novel diffusion-based generative modeling framework tailored for generative image compression. Unlike prior diffusion-based approaches that indirectly e… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  43. arXiv:2505.20673  [pdf, other

    eess.SP

    A Unified RCS Modeling of Typical Targets for 3GPP ISAC Channel Standardization and Experimental Analysis

    Authors: Yuxiang Zhang, Jianhua Zhang, Xidong Hu, Jiwei Zhang, Hongbo Xing, Huiwen Gong, Shilin Luo, Yifeng Xiong, Li Yu, Zhiqing Yuan, Guangyi Liu, Tao Jiang

    Abstract: Accurate radar cross section (RCS) modeling is crucial for characterizing target scattering and improving the precision of Integrated Sensing and Communication (ISAC) channel modeling. Existing RCS models are typically designed for specific target types, leading to increased complexity and lack of generalization. This makes it difficult to standardize RCS models for 3GPP ISAC channels, which need… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 13 pages,12 figures,39 conferences,submitted to IEEE Journal on Selected Areas in Communications

  44. arXiv:2505.20424  [pdf, ps, other

    cs.RO cs.AI eess.SY

    Robot Operation of Home Appliances by Reading User Manuals

    Authors: Jian Zhang, Hanbo Zhang, Anxing Xiao, David Hsu

    Abstract: Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) gr… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  45. arXiv:2505.19539  [pdf, ps, other

    eess.SP

    Water Level Sensing via Communication Signals in a Bi-Static System

    Authors: Zhongqin Wang, J. Andrew Zhang, Kai Wu, Y. Jay Guo

    Abstract: Accurate water level sensing is essential for flood monitoring, agricultural irrigation, and water resource optimization. Traditional methods require dedicated sensor deployments, leading to high installation costs, vulnerability to interference, and limited resolution. This work proposes PMNs-WaterSense, a novel scheme leveraging Channel State Information (CSI) from existing mobile networks for w… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  46. arXiv:2505.17426  [pdf, ps, other

    cs.SD cs.AI eess.AS

    UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic information

    Authors: Rui Wang, Qianguo Sun, Tianrong Chen, Zhiyun Zeng, Junlong Wu, Jiaxing Zhang

    Abstract: The emergence of multi-codebook neutral audio codecs such as Residual Vector Quantization (RVQ) and Group Vector Quantization (GVQ) has significantly advanced Large-Language-Model (LLM) based Text-to-Speech (TTS) systems. These codecs are crucial in separating semantic and acoustic information while efficiently harnessing semantic priors. However, since semantic and acoustic information cannot be… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  47. arXiv:2505.17421  [pdf, ps, other

    cs.IT eess.SP

    Adaptive Implicit-Based Deep Learning Channel Estimation for 6G Communications

    Authors: Zhen Qiao, Jiang Xue, Junkai Zhang, Guanzhang Liu, Xiaoqin Ma, Runhua Li, Faheem A. Khan, John S. Thompson, Zongben Xu

    Abstract: With the widespread deployment of fifth-generation (5G) wireless networks, research on sixth-generation (6G) technology is gaining momentum. Artificial Intelligence (AI) is anticipated to play a significant role in 6G, particularly through integration with the physical layer for tasks such as channel estimation. Considering resource limitations in real systems, the AI algorithm should be designed… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  48. arXiv:2505.16369  [pdf, ps, other

    cs.SD eess.AS

    X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance

    Authors: Junbo Zhang, Heinrich Dinkel, Yadong Niu, Chenyu Liu, Si Cheng, Anbei Zhao, Jian Luan

    Abstract: We introduces X-ARES (eXtensive Audio Representation and Evaluation Suite), a novel open-source benchmark designed to systematically assess audio encoder performance across diverse domains. By encompassing tasks spanning speech, environmental sounds, and music, X-ARES provides two evaluation approaches for evaluating audio representations: linear fine-tuning and unparameterized evaluation. The fra… ▽ More

    Submitted 27 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  49. arXiv:2505.16351  [pdf, other

    eess.AS cs.AI

    Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection

    Authors: Chenxu Guo, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Shuhe Li, Zongli Ye, Hwi Joo Park, Anaisha Das, Zoe Ezzes, Jet Vonk, Brittany Morin, Rian Bogley, Lisa Wauters, Zachary Miller, Maria Gorno-Tempini, Gopala Anumanchipalli

    Abstract: Automatic detection of speech dysfluency aids speech-language pathologists in efficient transcription of disordered speech, enhancing diagnostics and treatment planning. Traditional methods, often limited to classification, provide insufficient clinical insight, and text-independent models misclassify dysfluency, especially in context-dependent cases. This work introduces Dysfluent-WFST, a zero-sh… ▽ More

    Submitted 24 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Accepted for Interspeech2025

  50. arXiv:2505.16168  [pdf, ps, other

    cs.SD eess.AS

    Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty

    Authors: Hongfei Xue, Yufeng Tang, Jun Zhang, Xuelong Geng, Lei Xie

    Abstract: Although multilingual automatic speech recognition (ASR) systems have significantly advanced, enabling a single model to handle multiple languages, inherent linguistic differences and data imbalances challenge SOTA performance across all languages. While language identification (LID) models can route speech to the appropriate ASR model, they incur high costs from invoking SOTA commercial models an… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted by INTERSPEECH 2025