Skip to main content

Showing 1–50 of 1,474 results for author: Wang, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2509.24708  [pdf, ps, other

    eess.AS

    SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement

    Authors: Xingchen Li, Hanke Xie, Ziqian Wang, Zihan Zhang, Longshuai Xiao, Lei Xie

    Abstract: Generative universal speech enhancement (USE) methods aim to leverage generative models to improve speech quality under various types of distortions. Diffusion- or flow-based generative models are capable of producing enhanced speech with high quality and fidelity. However, they typically achieve speech enhancement by learning an acoustic feature mapping from degraded speech to clean speech, while… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: Under review

  2. arXiv:2509.24524  [pdf, ps, other

    cs.RO cs.AI eess.SY

    PhysiAgent: An Embodied Agent Framework in Physical World

    Authors: Zhihao Wang, Jianxiong Li, Jinliang Zheng, Wencong Zhang, Dongxiu Liu, Yinan Zheng, Haoyi Niu, Junzhi Yu, Xianyuan Zhan

    Abstract: Vision-Language-Action (VLA) models have achieved notable success but often struggle with limited generalizations. To address this, integrating generalized Vision-Language Models (VLMs) as assistants to VLAs has emerged as a popular solution. However, current approaches often combine these models in rigid, sequential structures: using VLMs primarily for high-level scene understanding and task plan… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

  3. arXiv:2509.23299  [pdf, ps, other

    cs.SD eess.AS

    MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow

    Authors: Yike Zhu, Boyi Kang, Ziqian Wang, Xingchen Li, Zihan Zhang, Wenjie Li, Longshuai Xiao, Wei Xue, Lei Xie

    Abstract: Speech enhancement (SE) recovers clean speech from noisy signals and is vital for applications such as telecommunications and automatic speech recognition (ASR). While generative approaches achieve strong perceptual quality, they often rely on multi-step sampling (diffusion/flow-matching) or large language models, limiting real-time deployment. To mitigate these constraints, we present MeanFlowSE,… ▽ More

    Submitted 30 September, 2025; v1 submitted 27 September, 2025; originally announced September 2025.

    Comments: Submitted to ICASSP 2026

  4. arXiv:2509.21290  [pdf, ps, other

    eess.SP

    Vision-Intelligence-Enabled Beam Tracking for Cross-Interface Water-Air Optical Wireless Communications

    Authors: Tianqi Mao, Jiayue Liu, Weijie Liu, Dezhi Zheng, Zhaocheng Wang

    Abstract: The escalating development of oceanic applications like underwater surveillance and mineral exploration, is motivating real-time wireless backhaul of the considerable observation data. Such prospects can be hardly realized by the narrowband acoustic approach. Alternatively, optical wireless communication (OWC) has emerged as a promising solution for maritime and underwater applications due to its… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  5. arXiv:2509.21118  [pdf, ps, other

    eess.SP cs.IT

    Neural Integrated Sensing and Communication for the MIMO-OFDM Downlink

    Authors: Ziyi Wang, Frederik Zumegen, Christoph Studer

    Abstract: The ongoing convergence of spectrum and hardware requirements for wireless sensing and communication applications has fueled the integrated sensing and communication (ISAC) paradigm in next-generation networks. Neural-network-based ISAC leverages data-driven learning techniques to add sensing capabilities to existing communication infrastructure. This paper presents a novel signal-processing frame… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

    Comments: To appear in the IEEE Journal on Selected Areas in Communications

  6. arXiv:2509.20030  [pdf, ps, other

    eess.SP

    Multi-Stage CD-Kennedy Receiver for QPSK Modulated CV-QKD in Turbulent Channels

    Authors: Renzhi Yuan, Zhixing Wang, Shouye Miao, Mufei Zhao, Haifeng Yao, Bin Cao, Mugen Peng

    Abstract: Continuous variable-quantum key distribution (CV-QKD) protocols attract increasing attentions in recent years because they enjoy high secret key rate (SKR) and good compatibility with existing optical communication infrastructure. Classical coherent receivers are widely employed in coherent states based CV-QKD protocols, whose detection performance is bounded by the standard quantum limit (SQL). R… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: 25pages,7 figures

  7. arXiv:2509.19754  [pdf, ps, other

    eess.SP

    Timeliness-Aware Joint Source and Channel Coding for Adaptive Image Transmission

    Authors: Xiaolei Yang, Zijing Wang, Zhijin Qin, Xiaoming Tao

    Abstract: Accurate and timely image transmission is critical for emerging time-sensitive applications such as remote sensing in satellite-assisted Internet of Things. However, the bandwidth limitation poses a significant challenge in existing wireless systems, making it difficult to fulfill the requirements of both high-fidelity and low-latency image transmission. Semantic communication is expected to break… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: 6 pages, 7 figures, accepted at IEEE GLOBECOM Workshops 2025

  8. arXiv:2509.18592  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation

    Authors: Neel P. Bhatt, Yunhao Yang, Rohan Siva, Pranay Samineni, Daniel Milan, Zhangyang Wang, Ufuk Topcu

    Abstract: Rapid adaptation in unseen environments is essential for scalable real-world autonomy, yet existing approaches rely on exhaustive exploration or rigid navigation policies that fail to generalize. We present VLN-Zero, a two-phase vision-language navigation framework that leverages vision-language models to efficiently construct symbolic scene graphs and enable zero-shot neurosymbolic navigation. In… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/

  9. arXiv:2509.18555  [pdf, ps, other

    eess.SP

    A Secure Affine Frequency Division Multiplexing for Wireless Communication Systems

    Authors: Ping Wang, Zulin Wang, Yuanfang Ma, Xiaosi Tian, Yuanhan Ni

    Abstract: This paper introduces a secure affine frequency division multiplexing (SE-AFDM) for wireless communication systems to enhance communication security. Besides configuring the parameter c1 to obtain communication reliability under doubly selective channels, we also utilize the time-varying parameter c2 to improve the security of the communications system. The derived input-output relation shows that… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

    Comments: 6 pages, 5 figures, 2025 IEEE International Conference on Communications

  10. arXiv:2509.17046  [pdf, ps, other

    eess.IV cs.AI cs.CV

    A Chain-of-thought Reasoning Breast Ultrasound Dataset Covering All Histopathology Categories

    Authors: Haojun Yu, Youcheng Li, Zihan Niu, Nan Zhang, Xuantong Gong, Huan Li, Zhiying Zou, Haifeng Qi, Zhenxiao Cao, Zijie Lan, Xingjian Yuan, Jiating He, Haokai Zhang, Shengtao Zhang, Zicheng Wang, Dong Wang, Ziwei Zhao, Congying Chen, Yong Wang, Wangyan Qin, Qingli Zhu, Liwei Wang

    Abstract: Breast ultrasound (BUS) is an essential tool for diagnosing breast lesions, with millions of examinations per year. However, publicly available high-quality BUS benchmarks for AI development are limited in data scale and annotation richness. In this work, we present BUS-CoT, a BUS dataset for chain-of-thought (CoT) reasoning analysis, which contains 11,439 images of 10,019 lesions from 4,838 patie… ▽ More

    Submitted 22 September, 2025; v1 submitted 21 September, 2025; originally announced September 2025.

  11. arXiv:2509.16963  [pdf

    cs.RO eess.SY

    A Reliable Robot Motion Planner in Complex Real-world Environments via Action Imagination

    Authors: Chengjin Wang, Yanmin Zhou, Zhipeng Wang, Zheng Yan, Feng Luan, Shuo Jiang, Runjie Shen, Hongrui Sang, Bin He

    Abstract: Humans and animals can make real-time adjustments to movements by imagining their action outcomes to prevent unanticipated or even catastrophic motion failures in unknown unstructured environments. Action imagination, as a refined sensorimotor strategy, leverages perception-action loops to handle physical interaction-induced uncertainties in perception and system modeling within complex systems. I… ▽ More

    Submitted 21 September, 2025; originally announced September 2025.

  12. arXiv:2509.15162  [pdf, ps, other

    eess.SP

    A Unified Distributed Algorithm for Hybrid Near-Far Field Activity Detection in Cell-Free Massive MIMO

    Authors: Jingreng Lei, Yang Li, Ziyue Wang, Qingfeng Lin, Ya-Feng Liu, Yik-Chung Wu

    Abstract: A great amount of endeavor has recently been devoted to activity detection for massive machine-type communications in cell-free multiple-input multiple-output (MIMO) systems. However, as the number of antennas at the access points (APs) increases, the Rayleigh distance that separates the near-field and far-field regions also expands, rendering the conventional assumption of far-field propagation a… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  13. arXiv:2509.13674  [pdf

    eess.SY

    Scaling green hydrogen and CCUS via cement-methanol co-production in China

    Authors: Yuezhang He, Hongxi Luo, Yuancheng Lin, Carl J. Talsma, Anna Li, Zhenqian Wang, Yujuan Fang, Pei Liu, Jesse D. Jenkins, Eric Larson, Zheng Li

    Abstract: High costs of green hydrogen and of carbon capture, utilization, and sequestration (CCUS) have hindered policy ambition and slowed real-world deployment, despite their importance for decarbonizing hard-to-abate sectors, including cement and methanol. Given the economic challenges of adopting CCUS in cement and green hydrogen in methanol production separately, we propose a renewable-powered co-prod… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  14. arXiv:2509.13658  [pdf, ps, other

    eess.AS

    Assessing Data Replication in Symbolic Music via Adapted Structural Similarity Index Measure

    Authors: Shulei Ji, Zihao Wang, Le Ma, Jiaxing Yu, Kejun Zhang

    Abstract: AI-generated music may inadvertently replicate samples from the training data, raising concerns of plagiarism. Similarity measures can quantify such replication, thereby offering supervision and guidance for music generation models. Existing similarity measure methods for symbolic music mainly target melody repetition, leaving a gap in assessing complex music with rich textures and expressive perf… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  15. arXiv:2509.12748  [pdf, ps, other

    eess.SP

    NEFT: A Unified Transformer Framework for Efficient Near-Field CSI Feedback in XL-MIMO Systems

    Authors: Haiyang Li, Tianqi Mao, Pengyu Wang, Ruiqi Liu, Shunyu Li, Zhaocheng Wang

    Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) systems, operating in the near-field region due to their massive antenna arrays, are a key enabler of next-generation wireless communications but face significant challenges in channel state information (CSI) feedback. Deep learning has emerged as a powerful tool by learning compact CSI representations for feedback. However, existing m… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  16. arXiv:2509.11516  [pdf

    cs.RO eess.SY

    PaiP: An Operational Aware Interactive Planner for Unknown Cabinet Environments

    Authors: Chengjin Wang, Zheng Yan, Yanmin Zhou, Runjie Shen, Zhipeng Wang, Bin Cheng, Bin He

    Abstract: Box/cabinet scenarios with stacked objects pose significant challenges for robotic motion due to visual occlusions and constrained free space. Traditional collision-free trajectory planning methods often fail when no collision-free paths exist, and may even lead to catastrophic collisions caused by invisible objects. To overcome these challenges, we propose an operational aware interactive motion… ▽ More

    Submitted 14 September, 2025; originally announced September 2025.

  17. arXiv:2509.10666  [pdf, ps, other

    eess.SP

    Uplink and Downlink Communications in Segmented Waveguide-Enabled Pinching-Antenna Systems (SWANs)

    Authors: Chongjun Ouyang, Hao Jiang, Zhaolin Wang, Yuanwei Liu, Zhiguo Ding

    Abstract: A segmented waveguide-enabled pinching-antenna system (SWAN) is proposed, in which a segmented waveguide composed of multiple short dielectric waveguide segments is employed to radiate or receive signals through the pinching antennas (PAs) deployed on each segment. Based on this architecture, three practical operating protocols are proposed: segment selection (SS), segment aggregation (SA), and se… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

    Comments: Submitted to IEEE journal

  18. arXiv:2509.10296  [pdf, ps, other

    eess.SP

    Low-Complexity Null-Space-Based Simultaneous Wireless Information and Power Transfer Scheme

    Authors: Cheng Luo, Jie Hu, Luping Xiang, Kun Yang, Zhiqin Wang

    Abstract: Simultaneous wireless information and power transfer (SWIPT) has attracted sustained interest. We propose a null-space-based transmission scheme for multiuser SWIPT serving both energy users (EUs) and information users (IUs). Under a practical nonlinear energy-harvesting (EH) model and multiple waveform options, we revisit the role of dedicated energy beams (EBs). We show that, in general, dedicat… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

  19. arXiv:2509.06425  [pdf

    eess.SY

    First-Principle Modeling Framework of Boost Converter Dynamics for Precise Energy Conversions in Space

    Authors: Yifan Wang, Wenhua Li, Zhenlong Wang, Xinrui Zhang, Jianfeng Sun, Qianfu Xia, Zhongtao Gou, Jiangang Rong, Tao Ye

    Abstract: Boost converters are essential for modern electrification and intelligent technologies. However, conventional Boost converter models relying on steady-state assumptions fail to accurately predict transient behaviors during input voltage and load fluctuations, which cause significant output voltage overshoots and instability, resulting in failures of electrical systems, thereby restricting their us… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

    Comments: 24 pages, 30 pages supplementary material, 5 figures, 14 supplementary figures, 6 supplementary tables

  20. arXiv:2509.06413  [pdf, ps, other

    cs.CV eess.IV

    VQualA 2025 Challenge on Image Super-Resolution Generated Content Quality Assessment: Methods and Results

    Authors: Yixiao Li, Xin Li, Chris Wei Zhou, Shuo Xing, Hadi Amirpour, Xiaoshuai Hao, Guanghui Yue, Baoquan Zhao, Weide Liu, Xiaoyuan Yang, Zhengzhong Tu, Xinyu Li, Chuanbiao Song, Chenqi Zhang, Jun Lan, Huijia Zhu, Weiqiang Wang, Xiaoyan Sun, Shishun Tian, Dongyang Yan, Weixia Zhang, Junlin Chen, Wei Sun, Zhihua Wang, Zhuohang Shi , et al. (6 additional authors not shown)

    Abstract: This paper presents the ISRGC-Q Challenge, built upon the Image Super-Resolution Generated Content Quality Assessment (ISRGen-QA) dataset, and organized as part of the Visual Quality Assessment (VQualA) Competition at the ICCV 2025 Workshops. Unlike existing Super-Resolution Image Quality Assessment (SR-IQA) datasets, ISRGen-QA places a greater emphasis on SR images generated by the latest generat… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

    Comments: 11 pages, 12 figures, VQualA ICCV Workshop

  21. arXiv:2509.06170  [pdf, ps, other

    eess.SP

    Pinching Antenna System (PASS) Enhanced Covert Communications: Against Warden via Sensing

    Authors: Hao Jiang, Zhaolin Wang, Yuanwei Liu, Arumugam Nallanathan, Zhiguo Ding

    Abstract: A sensing-aided covert communication network empowered by pinching antenna systems (PASS) is proposed in this work. Unlike conventional fixed-position MIMO arrays, PASS dynamically reconfigures its pinching antennas (PAs) closer to the legitimate user, substantially enhancing covertness. To further secure the adversary's channel state information (CSI), a sensing function is leveraged to track the… ▽ More

    Submitted 7 September, 2025; originally announced September 2025.

    Comments: Submit to possible IEEE journal

  22. arXiv:2509.05971  [pdf, ps, other

    eess.SP cs.MM

    DeepStream: Prototyping Deep Joint Source-Channel Coding for Real-Time Multimedia Transmissions

    Authors: Kaiyi Chi, Yinghui He, Qianqian Yang, Zhiping Jiang, Yuanchao Shu, Zhiqin Wang, Jun Luo, Jiming Chen

    Abstract: Deep learning-based joint source-channel coding (DeepJSCC) has emerged as a promising technique in 6G for enhancing the efficiency and reliability of data transmission across diverse modalities, particularly in low signal-to-noise ratio (SNR) environments. This advantage is realized by leveraging powerful neural networks to learn an optimal end-to-end mapping from the source data directly to the t… ▽ More

    Submitted 7 September, 2025; originally announced September 2025.

    Comments: 13 pages, 43 figures

  23. arXiv:2509.04870  [pdf, ps, other

    eess.IV cs.CV

    Multi-modal Uncertainty Robust Tree Cover Segmentation For High-Resolution Remote Sensing Images

    Authors: Yuanyuan Gui, Wei Li, Yinjian Wang, Xiang-Gen Xia, Mauro Marty, Christian Ginzler, Zuyuan Wang

    Abstract: Recent advances in semantic segmentation of multi-modal remote sensing images have significantly improved the accuracy of tree cover mapping, supporting applications in urban planning, forest monitoring, and ecological assessment. Integrating data from multiple modalities-such as optical imagery, light detection and ranging (LiDAR), and synthetic aperture radar (SAR)-has shown superior performance… ▽ More

    Submitted 5 September, 2025; originally announced September 2025.

  24. arXiv:2509.03421  [pdf

    eess.IV cs.CV

    Generalist versus Specialist Vision Foundation Models for Ocular Disease and Oculomics

    Authors: Yukun Zhou, Paul Nderitu, Jocelyn Hui Lin Goh, Justin Engelmann, Siegfried K. Wagner, Anran Ran, Hongyang Jiang, Lie Ju, Ke Zou, Sahana Srinivasan, Hyunmin Kim, Takahiro Ninomiya, Zheyuan Wang, Gabriel Dawei Yang, Eden Ruffell, Dominic Williamson, Rui Santos, Gabor Mark Somfai, Carol Y. Cheung, Tien Yin Wong, Daniel C. Alexander, Yih Chung Tham, Pearse A. Keane

    Abstract: Medical foundation models, pre-trained with large-scale clinical data, demonstrate strong performance in diverse clinically relevant applications. RETFound, trained on nearly one million retinal images, exemplifies this approach in applications with retinal images. However, the emergence of increasingly powerful and multifold larger generalist foundation models such as DINOv2 and DINOv3 raises the… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

    Comments: 39 pages, 8 Figures

    ACM Class: J.3; I.2.10

  25. arXiv:2509.02402  [pdf, ps, other

    eess.IV

    autoPET IV challenge: Incorporating organ supervision and human guidance for lesion segmentation in PET/CT

    Authors: Junwei Huang, Yingqi Hao, Yitong Luo, Ziyu Wang, Mingxuan Liu, Yifei Chen, Yuanhan Wang, Lei Xiang, Qiyuan Tian

    Abstract: Lesion Segmentation in PET/CT scans is an essential part of modern oncological workflows. To address the challenges of time-intensive manual annotation and high inter-observer variability, the autoPET challenge series seeks to advance automated segmentation methods in complex multi-tracer and multi-center settings. Building on this foundation, autoPET IV introduces a human-in-the-loop scenario to… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

  26. arXiv:2509.02116  [pdf, ps, other

    eess.SP

    Affine-Doppler Division Multiplexing for High-Mobility Wireless Communications Systems

    Authors: Yuanfang Ma, Zulin Wang, Peng Yuan, Qin Huang, Yuanhan Ni

    Abstract: Affine Frequency Division Multiplexing (AFDM) has been regarded as a candidate integrated sensing and communications (ISAC) waveform owing to its superior communication performance, outperforming the Orthogonal Time-Frequency Space (OTFS) that has been researched for a longer time. However, since the above two waveforms are incompatible with each other, the state-of-the-art methods well-designed f… ▽ More

    Submitted 4 September, 2025; v1 submitted 2 September, 2025; originally announced September 2025.

    Comments: 7 pages, 4 figures, 1 table

  27. arXiv:2509.01905  [pdf, ps, other

    eess.SP

    Efficient River Water Level Sensing Using Cellular CSI and Joint Space-Time Processing

    Authors: Khawaja Fahad Masood, Kai Wu, Zhongqin Wang, J. Andrew Zhang, Shu-Lin Chen, Y. Jay Guo

    Abstract: Accurate and timely water level monitoring is critical for flood prevention, environmental management, and emerging smart infrastructure systems. Traditional water sensing methods often rely on dedicated sensors, which can be costly to deploy and difficult to maintain and are vulnerable to damage during floods.In this work, we propose a novel cellular signalbased sensing scheme that passively esti… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

    Comments: 12 pages, 13 figures, submitted to an ieee journal for possible publication

  28. arXiv:2509.01217  [pdf, ps, other

    eess.IV cs.CV

    Learn2Reg 2024: New Benchmark Datasets Driving Progress on New Challenges

    Authors: Lasse Hansen, Wiebke Heyer, Christoph Großbröhmer, Frederic Madesta, Thilo Sentker, Wang Jiazheng, Yuxi Zhang, Hang Zhang, Min Liu, Junyi Wang, Xi Zhu, Yuhua Li, Liwen Wang, Daniil Morozov, Nazim Haouchine, Joel Honkamaa, Pekka Marttinen, Yichao Zhou, Zuopeng Tan, Zhuoyuan Wang, Yi Wang, Hongchao Zhou, Shunbo Hu, Yi Zhang, Qian Tao , et al. (29 additional authors not shown)

    Abstract: Medical image registration is critical for clinical applications, and fair benchmarking of different methods is essential for monitoring ongoing progress. To date, the Learn2Reg 2020-2023 challenges have released several complementary datasets and established metrics for evaluations. However, these editions did not capture all aspects of the registration problem, particularly in terms of modality… ▽ More

    Submitted 8 September, 2025; v1 submitted 1 September, 2025; originally announced September 2025.

    Comments: submitted to MELBA Journal v2: added Jinming Duan to author list

  29. arXiv:2509.00964  [pdf, ps, other

    eess.SP

    Doubly-Dispersive Continuous MIMO Systems: Channel Modeling and Beamforming Design

    Authors: Kuranage Roche Rayan Ranasinghe, Zhaolin Wang, Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, Emil Björnson

    Abstract: We address the modeling and optimal beamforming (BF) design for multiple-input multiple-output (MIMO) continuous aperture array (CAPA) systems operating over doubly-dispersive (DD) channels. First, a comprehensive DD continuous MIMO (DDC MIMO) channel model that incorporates CAPAs at both the transmitter (TX) and receiver (RX) is derived, which is used to obtain explicit input-output (I/O) relatio… ▽ More

    Submitted 4 September, 2025; v1 submitted 31 August, 2025; originally announced September 2025.

    Comments: Submitted to IEEE Transactions on Wireless Communications

  30. arXiv:2509.00314  [pdf, ps, other

    eess.SP

    CoMET: A Contrastive-Masked Brain Foundation Model for Universal EEG Representation

    Authors: Ang Li, Zikai Wang, Liuyin Yang, Zhenyu Wang, Tianheng Xu, Honglin Hu, Marc M. Van Hulle

    Abstract: Electroencephalography (EEG) is a non-invasive technique for recording brain activity, widely used in brain-computer interfaces, clinic, and healthcare. Traditional EEG deep models typically focus on specific dataset and task, limiting model size and generalization. Recently, self-supervised brain foundation models have emerged and been applied to various downstream tasks. Nevertheless, these mode… ▽ More

    Submitted 29 August, 2025; originally announced September 2025.

  31. arXiv:2508.20288  [pdf, ps, other

    eess.SY cs.LG

    Neural Spline Operators for Risk Quantification in Stochastic Systems

    Authors: Zhuoyuan Wang, Raffaele Romagnoli, Kamyar Azizzadenesheli, Yorie Nakahira

    Abstract: Accurately quantifying long-term risk probabilities in diverse stochastic systems is essential for safety-critical control. However, existing sampling-based and partial differential equation (PDE)-based methods often struggle to handle complex varying dynamics. Physics-informed neural networks learn surrogate mappings for risk probabilities from varying system parameters of fixed and finite dimens… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

  32. arXiv:2508.20141  [pdf

    eess.IV cs.AI cs.CV

    UltraEar: a multicentric, large-scale database combining ultra-high-resolution computed tomography and clinical data for ear diseases

    Authors: Ruowei Tang, Pengfei Zhao, Xiaoguang Li, Ning Xu, Yue Cheng, Mengshi Zhang, Zhixiang Wang, Zhengyu Zhang, Hongxia Yin, Heyu Ding, Shusheng Gong, Yuhe Liu, Zhenchang Wang

    Abstract: Ear diseases affect billions of people worldwide, leading to substantial health and socioeconomic burdens. Computed tomography (CT) plays a pivotal role in accurate diagnosis, treatment planning, and outcome evaluation. The objective of this study is to present the establishment and design of UltraEar Database, a large-scale, multicentric repository of isotropic 0.1 mm ultra-high-resolution CT (U-… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

  33. arXiv:2508.19644  [pdf, ps, other

    eess.SY

    Low-Cost Architecture and Efficient Pattern Synthesis for Polarimetric Phased Array Based on Polarization Coding Reconfigurable Elements

    Authors: Yiqing Wang, Jian Zhou, Chen Pang, Wenyang Man, Zixiang Xiong, Ke Meng, Zhanling Wang, Yongzhen Li

    Abstract: Polarimetric phased arrays (PPAs) enhance radar target detection and anti-jamming capabilities. However, the dual transmit/receive (T/R) channel requirement leads to high costs and system complexity. To address this, this paper introduces a polarization-coding reconfigurable phased array (PCRPA) and associated pattern synthesis techniques to reduce PPA costs while minimizing performance degradatio… ▽ More

    Submitted 28 August, 2025; v1 submitted 27 August, 2025; originally announced August 2025.

  34. arXiv:2508.19205  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    VibeVoice Technical Report

    Authors: Zhiliang Peng, Jianwei Yu, Wenhui Wang, Yaoyao Chang, Yutao Sun, Li Dong, Yi Zhu, Weijiang Xu, Hangbo Bao, Zehua Wang, Shaohan Huang, Yan Xia, Furu Wei

    Abstract: This report presents VibeVoice, a novel model designed to synthesize long-form speech with multiple speakers by employing next-token diffusion, which is a unified method for modeling continuous data by autoregressively generating latent vectors via diffusion. To enable this, we introduce a novel continuous speech tokenizer that, when compared to the popular Encodec model, improves data compression… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

  35. arXiv:2508.18653  [pdf, ps, other

    cs.LG cs.AI cs.SD eess.AS

    The Sound of Risk: A Multimodal Physics-Informed Acoustic Model for Forecasting Market Volatility and Enhancing Market Interpretability

    Authors: Xiaoliang Chen, Xin Yu, Le Chang, Teng Jing, Jiashuai He, Ze Wang, Yangjun Luo, Xingyu Chen, Jiayue Liang, Yuchen Wang, Jiaying Xie

    Abstract: Information asymmetry in financial markets, often amplified by strategically crafted corporate narratives, undermines the effectiveness of conventional textual analysis. We propose a novel multimodal framework for financial risk assessment that integrates textual sentiment with paralinguistic cues derived from executive vocal tract dynamics in earnings calls. Central to this framework is the Physi… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

    Comments: 9 pages, 6 figures

    MSC Class: 62P05; 68T0 ACM Class: I.2.7; J.4

  36. arXiv:2508.18295  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems

    Authors: Huangyu Dai, Lingtao Mao, Ben Chen, Zihan Wang, Zihan Liang, Ying Han, Chenyi Lei, Han Li

    Abstract: Hotword customization is crucial in ASR to enhance the accuracy of domain-specific terms. It has been primarily driven by the advancements in traditional models and Audio large language models (LLMs). However, existing models often struggle with large-scale hotwords, as the recognition rate drops dramatically with the number of hotwords increasing. In this paper, we introduce a novel hotword custo… ▽ More

    Submitted 22 August, 2025; originally announced August 2025.

  37. arXiv:2508.16569  [pdf, ps, other

    eess.IV cs.AI cs.CV

    A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer

    Authors: Yuhui Tao, Zhongwei Zhao, Zilong Wang, Xufang Luo, Feng Chen, Kang Wang, Chuanfu Wu, Xue Zhang, Shaoting Zhang, Jiaxi Yao, Xingwei Jin, Xinyang Jiang, Yifan Yang, Dongsheng Li, Lili Qiu, Zhiqiang Shao, Jianming Guo, Nengwang Yu, Shuo Wang, Ying Xiong

    Abstract: The non-invasive assessment of increasingly incidentally discovered renal masses is a critical challenge in urologic oncology, where diagnostic uncertainty frequently leads to the overtreatment of benign or indolent tumors. In this study, we developed and validated RenalCLIP using a dataset of 27,866 CT scans from 8,809 patients across nine Chinese medical centers and the public TCIA cohort, a vis… ▽ More

    Submitted 22 August, 2025; originally announced August 2025.

  38. arXiv:2508.13479  [pdf, ps, other

    cs.CV eess.IV

    AIM 2025 challenge on Inverse Tone Mapping Report: Methods and Results

    Authors: Chao Wang, Francesco Banterle, Bin Ren, Radu Timofte, Xin Lu, Yufeng Peng, Chengjie Ge, Zhijing Sun, Ziang Zhou, Zihao Li, Zishun Liao, Qiyu Kang, Xueyang Fu, Zheng-Jun Zha, Zhijing Sun, Xingbo Wang, Kean Liu, Senyan Xu, Yang Qiu, Yifan Ding, Gabriel Eilertsen, Jonas Unger, Zihao Wang, Ke Wu, Jinshan Pan , et al. (4 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the AIM 2025 Challenge on Inverse Tone Mapping (ITM). The challenge aimed to push forward the development of effective ITM algorithms for HDR image reconstruction from single LDR inputs, focusing on perceptual fidelity and numerical consistency. A total of \textbf{67} participants submitted \textbf{319} valid results, from which the best five teams wer… ▽ More

    Submitted 21 September, 2025; v1 submitted 18 August, 2025; originally announced August 2025.

  39. arXiv:2508.13306  [pdf, ps, other

    eess.SY

    Stochastic Black Start Resource Allocation to Enable Dynamic Formation of Networked Microgrids and DER-aided Restoration

    Authors: Cong Bai, Salish Maharjan, Han Wang, Zhaoyu Wang

    Abstract: Extended outages in distributed systems (DSs) dominated by distributed energy resources (DERs) require innovative strategies to efficiently and securely deploy black start (BS) resources. To address the need, this paper proposes a two-stage stochastic resource allocation method within synchronizing dynamic microgrids (MGs) for black start (SDMG-BS), enabling risk-averse and adaptive restoration ac… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

  40. arXiv:2508.13090  [pdf, ps, other

    eess.SY

    Exploiting Convexity of Neural Networks in Dynamic Operating Envelope Optimization for Distributed Energy Resources

    Authors: Hongyi Li, Liming Liu, Yunyi Li, Zhaoyu Wang

    Abstract: The increasing penetration of distributed energy resources (DERs) brings opportunities and challenges to the operation of distribution systems. To ensure network integrity, dynamic operating envelopes (DOEs) are issued by utilities to DERs as their time-varying export/import power limits. Due to the non-convex nature of power flow equations, the optimization of DOEs faces a dilemma of solution acc… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

  41. arXiv:2508.12937  [pdf, ps, other

    eess.SY

    Grid Edge Intelligence-Assisted Model Predictive Framework for Black Start of Distribution Systems with Inverter-Based Resources

    Authors: Junyuan Zheng, Salish Maharjan, Zhaoyu Wang

    Abstract: The growing proliferation of distributed energy resources (DERs) is significantly enhancing the resilience and reliability of distribution systems. However, a substantial portion of behind-the-meter (BTM) DERs is often overlooked during black start (BS) and restoration processes. Existing BS strategies that utilize grid-forming (GFM) battery energy storage systems (BESS) frequently ignore critical… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

    Comments: This manuscript has been submitted to IEEE Transaction on Smart Grid

  42. arXiv:2508.12728  [pdf, ps, other

    eess.SP

    LLM-RIMSA: Large Language Models driven Reconfigurable Intelligent Metasurface Antenna Systems

    Authors: Yunsong Huang, Hui-Ming Wang, Qingli Yan, Zhaowei Wang

    Abstract: The evolution of 6G networks demands ultra-massive connectivity and intelligent radio environments, yet existing reconfigurable intelligent surface (RIS) technologies face critical limitations in hardware efficiency, dynamic control, and scalability. This paper introduces LLM-RIMSA, a transformative framework that integrates large language models (LLMs) with a novel reconfigurable intelligent meta… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

  43. arXiv:2508.12614  [pdf, ps, other

    eess.SP cs.HC cs.LG

    Towards SISO Bistatic Sensing for ISAC

    Authors: Zhongqin Wang, J. Andrew Zhang, Kai Wu, Min Xu, Y. Jay Guo

    Abstract: Integrated Sensing and Communication (ISAC) is a key enabler for next-generation wireless systems. However, real-world deployment is often limited to low-cost, single-antenna transceivers. In such bistatic Single-Input Single-Output (SISO) setup, clock asynchrony introduces random phase offsets in Channel State Information (CSI), which cannot be mitigated using conventional multi-antenna methods.… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

  44. arXiv:2508.12408  [pdf, ps, other

    eess.SY

    Data-driven quantification and visualization of resilience metrics of power distribution system

    Authors: Dingwei Wang, Salish Maharjan, Junyuan Zheng, Liming Liu, Zhaoyu Wang

    Abstract: This paper presents a data-driven approach for quantifying the resilience of distribution power grids to extreme weather events using two key metrics: (a) the number of outages and (b) restoration time. The method leverages historical outage records maintained by power utilities and weather measurements collected by the National Oceanic and Atmospheric Administration (NOAA) to evaluate resilience… ▽ More

    Submitted 17 August, 2025; originally announced August 2025.

    Comments: This paper has been submitted to Nature Communication Engineering

  45. arXiv:2508.12395  [pdf, ps, other

    cs.RO eess.SY

    PUB: A Plasma-Propelled Ultra-Quiet Blimp with Two-DOF Vector Thrusting

    Authors: Zihan Wang

    Abstract: This study presents the design and control of a Plasma-propelled Ultra-silence Blimp (PUB), a novel aerial robot employing plasma vector propulsion for ultra-quiet flight without mechanical propellers. The system utilizes a helium-lift platform for extended endurance and a four-layer ring asymmetric capacitor to generate ionic wind thrust. The modular propulsion units allow flexible configuration… ▽ More

    Submitted 28 August, 2025; v1 submitted 17 August, 2025; originally announced August 2025.

  46. arXiv:2508.12320  [pdf, ps, other

    eess.SP

    Jamming Identification with Differential Transformer for Low-Altitude Wireless Networks

    Authors: Pengyu Wang, Zhaocheng Wang, Tianqi Mao, Weijie Yuan, Haijun Zhang, George K. Karagiannidis

    Abstract: Wireless jamming identification, which detects and classifies electromagnetic jamming from non-cooperative devices, is crucial for emerging low-altitude wireless networks consisting of many drone terminals that are highly susceptible to electromagnetic jamming. However, jamming identification schemes adopting deep learning (DL) are vulnerable to attacks involving carefully crafted adversarial samp… ▽ More

    Submitted 17 August, 2025; originally announced August 2025.

  47. arXiv:2508.12190  [pdf, ps, other

    eess.IV cs.CV

    DermINO: Hybrid Pretraining for a Versatile Dermatology Foundation Model

    Authors: Jingkai Xu, De Cheng, Xiangqian Zhao, Jungang Yang, Zilong Wang, Xinyang Jiang, Xufang Luo, Lili Chen, Xiaoli Ning, Chengxu Li, Xinzhu Zhou, Xuejiao Song, Ang Li, Qingyue Xia, Zhou Zhuang, Hongfei Ouyang, Ke Xue, Yujun Sheng, Rusong Meng, Feng Xu, Xi Yang, Weimin Ma, Yusheng Lee, Dongsheng Li, Xinbo Gao , et al. (5 additional authors not shown)

    Abstract: Skin diseases impose a substantial burden on global healthcare systems, driven by their high prevalence (affecting up to 70% of the population), complex diagnostic processes, and a critical shortage of dermatologists in resource-limited areas. While artificial intelligence(AI) tools have demonstrated promise in dermatological image analysis, current models face limitations-they often rely on large… ▽ More

    Submitted 24 September, 2025; v1 submitted 16 August, 2025; originally announced August 2025.

  48. arXiv:2508.11886  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG eess.IV

    EVTP-IVS: Effective Visual Token Pruning For Unifying Instruction Visual Segmentation In Multi-Modal Large Language Models

    Authors: Wenhui Zhu, Xiwen Chen, Zhipeng Wang, Shao Tang, Sayan Ghosh, Xuanzhao Dong, Rajat Koner, Yalin Wang

    Abstract: Instructed Visual Segmentation (IVS) tasks require segmenting objects in images or videos based on natural language instructions. While recent multimodal large language models (MLLMs) have achieved strong performance on IVS, their inference cost remains a major bottleneck, particularly in video. We empirically analyze visual token sampling in MLLMs and observe a strong correlation between subset t… ▽ More

    Submitted 15 August, 2025; originally announced August 2025.

  49. arXiv:2508.11115  [pdf, ps, other

    cs.CV cs.HC eess.SP

    UWB-PostureGuard: A Privacy-Preserving RF Sensing System for Continuous Ergonomic Sitting Posture Monitoring

    Authors: Haotang Li, Zhenyu Qi, Sen He, Kebin Peng, Sheng Tan, Yili Ren, Tomas Cerny, Jiyue Zhao, Zi Wang

    Abstract: Improper sitting posture during prolonged computer use has become a significant public health concern. Traditional posture monitoring solutions face substantial barriers, including privacy concerns with camera-based systems and user discomfort with wearable sensors. This paper presents UWB-PostureGuard, a privacy-preserving ultra-wideband (UWB) sensing system that advances mobile technologies for… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

  50. arXiv:2508.10830  [pdf, ps, other

    cs.SD eess.AS

    Advances in Speech Separation: Techniques, Challenges, and Future Trends

    Authors: Kai Li, Guo Chen, Wendi Sang, Yi Luo, Zhuo Chen, Shuai Wang, Shulin He, Zhong-Qiu Wang, Andong Li, Zhiyong Wu, Xiaolin Hu

    Abstract: The field of speech separation, addressing the "cocktail party problem", has seen revolutionary advances with DNNs. Speech separation enhances clarity in complex acoustic environments and serves as crucial pre-processing for speech recognition and speaker recognition. However, current literature focuses narrowly on specific architectures or isolated approaches, creating fragmented understanding. T… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

    Comments: 34 pages, 10 figures