Skip to main content

Showing 1–50 of 204 results for author: Wu, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.20945  [pdf, ps, other

    cs.SD eess.AS

    A Multi-Stage Framework for Multimodal Controllable Speech Synthesis

    Authors: Rui Niu, Weihao Wu, Jie Chen, Long Ma, Zhiyong Wu

    Abstract: Controllable speech synthesis aims to control the style of generated speech using reference input, which can be of various modalities. Existing face-based methods struggle with robustness and generalization due to data quality constraints, while text prompt methods offer limited diversity and fine-grained control. Although multimodal approaches aim to integrate various modalities, their reliance o… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted by ICME2025

  2. arXiv:2506.20158  [pdf, ps, other

    cs.IT eess.SP

    Efficient Channel Estimation for Rotatable Antenna-Enabled Wireless Communication

    Authors: Xue Xiong, Beixiong Zheng, Wen Wu, Xiaodan Shao, Liang Dai, Ming-Min Zhao, Jie Tang

    Abstract: Non-fixed flexible antenna architectures, such as fluid antenna system (FAS), movable antenna (MA), and pinching antenna, have garnered significant interest in recent years. Among them, rotatable antenna (RA) is a promising antenna architecture that exploits additional spatial degrees of freedom (DoFs) to enhance the communication performance. To fully obtain the performance gain provided by RAs,… ▽ More

    Submitted 29 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: 5 pages, 4 figures

  3. arXiv:2506.18404  [pdf, ps, other

    eess.IV

    SafeClick: Error-Tolerant Interactive Segmentation of Any Medical Volumes via Hierarchical Expert Consensus

    Authors: Yifan Gao, Jiaxi Sheng, Wenbin Wu, Haoyue Li, Yaoxian Dong, Chaoyang Ge, Feng Yuan, Xin Gao

    Abstract: Foundation models for volumetric medical image segmentation have emerged as powerful tools in clinical workflows, enabling radiologists to delineate regions of interest through intuitive clicks. While these models demonstrate promising capabilities in segmenting previously unseen anatomical structures, their performance is strongly influenced by prompt quality. In clinical settings, radiologists o… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: MICCAI 2025

  4. arXiv:2506.12935  [pdf, ps, other

    cs.CL cs.MM cs.SD eess.AS

    SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models

    Authors: Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui

    Abstract: While large language models have shown reasoning capabilities, their application to the audio modality, particularly in large audio-language models (ALMs), remains significantly underdeveloped. Addressing this gap requires a systematic approach, involving a capable base model, high-quality reasoning-oriented audio data, and effective training algorithms. In this study, we present a comprehensive s… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  5. arXiv:2506.09792  [pdf, ps, other

    cs.SD cs.LG cs.MM eess.AS

    Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction

    Authors: Wenxuan Wu, Shuai Wang, Xixin Wu, Helen Meng, Haizhou Li

    Abstract: Audio-visual target speaker extraction (AV-TSE) models primarily rely on target visual cues to isolate the target speaker's voice from others. We know that humans leverage linguistic knowledge, such as syntax and semantics, to support speech perception. Inspired by this, we explore the potential of pre-trained speech-language models (PSLMs) and pre-trained language models (PLMs) as auxiliary knowl… ▽ More

    Submitted 15 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  6. arXiv:2506.01319  [pdf, ps, other

    cs.SD cs.CV cs.MM eess.AS

    Learning Sparsity for Effective and Efficient Music Performance Question Answering

    Authors: Xingjian Diao, Tianzhen Yang, Chunhui Zhang, Weiyi Wu, Ming Cheng, Jiang Gui

    Abstract: Music performances, characterized by dense and continuous audio as well as seamless audio-visual integration, present unique challenges for multimodal scene understanding and reasoning. Recent Music Performance Audio-Visual Question Answering (Music AVQA) datasets have been proposed to reflect these challenges, highlighting the continued need for more effective integration of audio-visual represen… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to the main conference of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

  7. arXiv:2506.00350  [pdf, ps, other

    cs.SD eess.AS

    DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model

    Authors: Xueyuan Chen, Dongchao Yang, Wenxuan Wu, Minglin Wu, Jing Xu, Xixin Wu, Zhiyong Wu, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) aims to convert dysarthric speech into comprehensible speech while maintaining the speaker's identity. Despite significant advancements, existing methods often struggle with low speech intelligibility and poor speaker similarity. In this study, we introduce a novel diffusion-based DSR system that leverages a latent diffusion model to enhance the quality of sp… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  8. arXiv:2505.24314  [pdf, ps, other

    cs.SD eess.AS

    DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec

    Authors: Peijie Chen, Wenhao Guan, Kaidi Wang, Weijie Wu, Hukai Huang, Qingyang Hong, Lin Li

    Abstract: Neural speech codecs are essential for advancing text-to-speech (TTS) systems. With the recent success of large language models in text generation, developing high-quality speech tokenizers has become increasingly important. This paper introduces DS-Codec, a novel neural speech codec featuring a dual-stage training framework with mirror and non-mirror architectures switching, designed to achieve s… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted to Interspeech 2025

  9. arXiv:2505.24291  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Discl-VC: Disentangled Discrete Tokens and In-Context Learning for Controllable Zero-Shot Voice Conversion

    Authors: Kaidi Wang, Wenhao Guan, Ziyue Jiang, Hukai Huang, Peijie Chen, Weijie Wu, Qingyang Hong, Lin Li

    Abstract: Currently, zero-shot voice conversion systems are capable of synthesizing the voice of unseen speakers. However, most existing approaches struggle to accurately replicate the speaking style of the source speaker or mimic the distinctive speaking style of the target speaker, thereby limiting the controllability of voice conversion. In this work, we propose Discl-VC, a novel voice conversion framewo… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  10. arXiv:2505.20638  [pdf, ps, other

    cs.SD cs.CV cs.MM eess.AS

    Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs

    Authors: Wenhao You, Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Zhongyu Ouyang, Chiyu Ma, Tingxuan Wu, Noah Wei, Zong Ke, Ming Cheng, Soroush Vosoughi, Jiang Gui

    Abstract: While recent Multimodal Large Language Models exhibit impressive capabilities for general multimodal tasks, specialized domains like music necessitate tailored approaches. Music Audio-Visual Question Answering (Music AVQA) particularly underscores this, presenting unique challenges with its continuous, densely layered audio-visual content, intricate temporal dynamics, and the critical need for dom… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  11. arXiv:2505.18185  [pdf, ps, other

    eess.SP cs.LG

    BrainOmni: A Brain Foundation Model for Unified EEG and MEG Signals

    Authors: Qinfan Xiao, Ziyun Cui, Chi Zhang, Siqi Chen, Wen Wu, Andrew Thwaites, Alexandra Woolgar, Bowen Zhou, Chao Zhang

    Abstract: Electroencephalography (EEG) and magnetoencephalography (MEG) measure neural activity non-invasively by capturing electromagnetic fields generated by dendritic currents. Although rooted in the same biophysics, EEG and MEG exhibit distinct signal patterns, further complicated by variations in sensor configurations across modalities and recording devices. Existing approaches typically rely on separa… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  12. arXiv:2505.15058  [pdf, ps, other

    cs.SD cs.AI cs.CV cs.GR eess.AS

    AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars

    Authors: Tianbao Zhang, Jian Zhao, Yuer Li, Zheng Zhu, Ping Hu, Zhaoxin Fan, Wenjun Wu, Xuelong Li

    Abstract: Whole-body audio-driven avatar pose and expression generation is a critical task for creating lifelike digital humans and enhancing the capabilities of interactive virtual agents, with wide-ranging applications in virtual reality, digital entertainment, and remote communication. Existing approaches often generate audio-driven facial expressions and gestures independently, which introduces a signif… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 11pages, conference

    MSC Class: 68T10

  13. arXiv:2505.10502  [pdf, ps, other

    eess.IV

    WeGA: Weakly-Supervised Global-Local Affinity Learning Framework for Lymph Node Metastasis Prediction in Rectal Cancer

    Authors: Yifan Gao, Yaoxian Dong, Wenbin Wu, Chaoyang Ge, Feng Yuan, Jiaxi Sheng, Haoyue Li, Xin Gao

    Abstract: Accurate lymph node metastasis (LNM) assessment in rectal cancer is essential for treatment planning, yet current MRI-based evaluation shows unsatisfactory accuracy, leading to suboptimal clinical decisions. Developing automated systems also faces significant obstacles, primarily the lack of node-level annotations. Previous methods treat lymph nodes as isolated entities rather than as an interconn… ▽ More

    Submitted 16 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

  14. arXiv:2505.09985  [pdf

    eess.IV cs.CV

    Ordered-subsets Multi-diffusion Model for Sparse-view CT Reconstruction

    Authors: Pengfei Yu, Bin Huang, Minghui Zhang, Weiwen Wu, Shaoyu Wang, Qiegen Liu

    Abstract: Score-based diffusion models have shown significant promise in the field of sparse-view CT reconstruction. However, the projection dataset is large and riddled with redundancy. Consequently, applying the diffusion model to unprocessed data results in lower learning effectiveness and higher learning difficulty, frequently leading to reconstructed images that lack fine details. To address these issu… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  15. arXiv:2505.08414  [pdf

    eess.IV cs.CV

    An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

    Authors: Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta, Ecosse Lamoureux, Seang Mei Saw, Vinay Nangia, Songhomitra Panda-Jonas, Jie Xu, Ya Xing Wang , et al. (6 additional authors not shown)

    Abstract: Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptati… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  16. arXiv:2505.07687  [pdf, ps, other

    eess.IV cs.CV

    ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

    Authors: Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao

    Abstract: Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for pr… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: MICCAI 2025(under view)

  17. arXiv:2505.06256  [pdf, other

    eess.SP cs.AI

    SpectrumFM: A Foundation Model for Intelligent Spectrum Management

    Authors: Fuhui Zhou, Chunyu Liu, Hao Zhang, Wei Wu, Qihui Wu, Derrick Wing Kwan Ng, Tony Q. S. Quek, Chan-Byoung Chae

    Abstract: Intelligent spectrum management is crucial for improving spectrum efficiency and achieving secure utilization of spectrum resources. However, existing intelligent spectrum management methods, typically based on small-scale models, suffer from notable limitations in recognition accuracy, convergence speed, and generalization, particularly in the complex and dynamic spectrum environments. To address… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  18. arXiv:2505.04453  [pdf, ps, other

    eess.SP

    Meta-Learning Driven Lightweight Phase Shift Compression for IRS-Assisted Wireless Systems

    Authors: Xianhua Yu, Dong Li, Bowen Gu, Xiaoye Jing, Wen Wu, Tuo Wu, Kan Yu

    Abstract: The phase shift information (PSI) overhead poses a critical challenge to enabling real-time intelligent reflecting surface (IRS)-assisted wireless systems, particularly under dynamic and resource-constrained conditions. In this paper, we propose a lightweight PSI compression framework, termed meta-learning-driven compression and reconstruction network (MCRNet). By leveraging a few-shot adaptation… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  19. arXiv:2504.11871  [pdf, ps, other

    eess.SP

    Channel-Adaptive Robust Resource Allocation for Highly Reliable IRS-Assisted V2X Communications

    Authors: Peng Wang, Weihua Wu

    Abstract: This paper addresses the challenges of resource allocation in vehicular networks enhanced by Intelligent Reflecting Surfaces (IRS), considering the uncertain Channel State Information (CSI) typical of vehicular environments due to the Doppler shift. Leveraging the 3GPP's Mode 1 cellular V2X architecture, our system model facilitates efficient subcarrier usage and interference reduction through coo… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  20. arXiv:2504.00750  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    $C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction

    Authors: Wenxuan Wu, Xueyuan Chen, Shuai Wang, Jiadong Wang, Lingwei Meng, Xixin Wu, Helen Meng, Haizhou Li

    Abstract: Audio-Visual Target Speaker Extraction (AV-TSE) aims to mimic the human ability to enhance auditory perception using visual cues. Although numerous models have been proposed recently, most of them estimate target signals by primarily relying on local dependencies within acoustic features, underutilizing the human-like capacity to infer unclear parts of speech through contextual information. This l… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE Journal of Selected Topics in Signal Processing (JSTSP)

  21. arXiv:2502.19924  [pdf, other

    cs.SD cs.AI eess.AS

    DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models

    Authors: Weihao wu, Zhiwei Lin, Yixuan Zhou, Jingbei Li, Rui Niu, Qinghua Wu, Songjun Cao, Long Ma, Zhiyong Wu

    Abstract: Conversational speech synthesis (CSS) aims to synthesize both contextually appropriate and expressive speech, and considerable efforts have been made to enhance the understanding of conversational context. However, existing CSS systems are limited to deterministic prediction, overlooking the diversity of potential responses. Moreover, they rarely employ language model (LM)-based TTS backbones, lim… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Accepted by ICASSP 2025

  22. arXiv:2502.07230  [pdf, ps, other

    eess.SY

    Physics-Informed Recurrent Network for State-Space Modeling of Gas Pipeline Networks

    Authors: Siyuan Wang, Wenchuan Wu, Chenhui Lin, Qi Wang, Shuwei Xu, Binbin Chen

    Abstract: As a part of the integrated energy system (IES), gas pipeline networks can provide additional flexibility to power systems through coordinated optimal dispatch. An accurate pipeline network model is critical for the optimal operation and control of IESs. However, inaccuracies or unavailability of accurate pipeline parameters often introduce errors in the state-space models of such networks. This p… ▽ More

    Submitted 19 June, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: 9 Pages

  23. arXiv:2502.06710  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Learning Musical Representations for Music Performance Question Answering

    Authors: Xingjian Diao, Chunhui Zhang, Tingxuan Wu, Ming Cheng, Zhongyu Ouyang, Weiyi Wu, Jiang Gui

    Abstract: Music performances are representative scenarios for audio-visual modeling. Unlike common scenarios with sparse audio, music performances continuously involve dense audio signals throughout. While existing multimodal learning methods on the audio-video QA demonstrate impressive capabilities in general scenarios, they are incapable of dealing with fundamental problems within the music performances:… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted at EMNLP 2024

  24. arXiv:2502.06020  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding

    Authors: Xingjian Diao, Chunhui Zhang, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui

    Abstract: Multimodal foundation models (MFMs) have demonstrated significant success in tasks such as visual captioning, question answering, and image-text retrieval. However, these models face inherent limitations due to their finite internal capacity, which restricts their ability to process extended temporal sequences, a crucial requirement for comprehensive video and audio analysis. To overcome these cha… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted at NAACL 2025

  25. Terahertz Integrated Sensing and Communication-Empowered UAVs in 6G: A Transceiver Design Perspective

    Authors: Ruoyu Zhang, Wen Wu, Xiaoming Chen, Zhen Gao, Yueming Cai

    Abstract: Due to their high maneuverability, flexible deployment, and low cost, unmanned aerial vehicles (UAVs) are expected to play a pivotal role in not only communication, but also sensing. Especially by exploiting the ultra-wide bandwidth of terahertz (THz) bands, integrated sensing and communication (ISAC)-empowered UAV has been a promising technology of 6G space-air-ground integrated networks. In this… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Journal ref: IEEE Vehicular Technology Magazine, 2025

  26. arXiv:2501.07270  [pdf, other

    eess.SP

    Dual-Function Beamforming Design For Multi-Target Localization and Reliable Communications

    Authors: Bo Tang, Da Li, Wenjun Wu, Astha Saini, Prabhu Babu, Petre Stoica

    Abstract: This paper investigates the transmit beamforming design for multiple-input multiple-output systems to support both multi-target localization and multi-user communications. To enhance the target localization performance, we derive the asymptotic Cramér-Rao bound (CRB) for target angle estimation by assuming that the receive array is linear and uniform. Then we formulate a beamforming design problem… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

    Comments: 31 pages, 14 figures

  27. arXiv:2501.06474  [pdf, other

    eess.AS cs.SD

    The 1st SpeechWellness Challenge: Detecting Suicide Risk Among Adolescents

    Authors: Wen Wu, Ziyun Cui, Chang Lei, Yinan Duan, Diyang Qu, Ji Wu, Bowen Zhou, Runsen Chen, Chao Zhang

    Abstract: The 1st SpeechWellness Challenge (SW1) aims to advance methods for detecting current suicide risk in adolescents using speech analysis techniques. Suicide among adolescents is a critical public health issue globally. Early detection of suicidal tendencies can lead to timely intervention and potentially save lives. Traditional methods of assessment often rely on self-reporting or clinical interview… ▽ More

    Submitted 20 May, 2025; v1 submitted 11 January, 2025; originally announced January 2025.

  28. arXiv:2501.05961  [pdf, other

    cs.CV eess.IV

    Swin-X2S: Reconstructing 3D Shape from 2D Biplanar X-ray with Swin Transformers

    Authors: Kuan Liu, Zongyuan Ying, Jie Jin, Dongyan Li, Ping Huang, Wenjian Wu, Zhe Chen, Jin Qi, Yong Lu, Lianfu Deng, Bo Chen

    Abstract: The conversion from 2D X-ray to 3D shape holds significant potential for improving diagnostic efficiency and safety. However, existing reconstruction methods often rely on hand-crafted features, manual intervention, and prior knowledge, resulting in unstable shape errors and additional processing costs. In this paper, we introduce Swin-X2S, an end-to-end deep learning method for directly reconstru… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

  29. arXiv:2412.18831  [pdf, ps, other

    math.OC eess.SY

    Data-driven $H_{\infty}$ predictive control for constrained systems: a Lagrange duality approach

    Authors: Wenhuang Wu, Lulu Guo, Nan Li, Hong Chen

    Abstract: This article proposes a data-driven $H_{\infty}$ control scheme for time-domain constrained systems based on model predictive control formulation. The scheme combines $H_{\infty}$ control and minimax model predictive control, enabling more effective handling of external disturbances and time-domain constraints. First, by leveraging input-output-disturbance data, the scheme ensures $H_{\infty}$ per… ▽ More

    Submitted 17 March, 2025; v1 submitted 25 December, 2024; originally announced December 2024.

    Comments: 11 pages, 4 figures

  30. arXiv:2412.01303  [pdf

    eess.SY cs.AI

    RL2: Reinforce Large Language Model to Assist Safe Reinforcement Learning for Energy Management of Active Distribution Networks

    Authors: Xu Yang, Chenhui Lin, Haotian Liu, Wenchuan Wu

    Abstract: As large-scale distributed energy resources are integrated into the active distribution networks (ADNs), effective energy management in ADNs becomes increasingly prominent compared to traditional distribution networks. Although advanced reinforcement learning (RL) methods, which alleviate the burden of complicated modelling and optimization, have greatly improved the efficiency of energy managemen… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  31. arXiv:2412.01100  [pdf, other

    cs.SD eess.AS

    The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024

    Authors: Shuoyi Zhou, Yixuan Zhou, Weiqin Li, Jun Chen, Runchuan Ye, Weihao Wu, Zijian Lin, Shun Lei, Zhiyong Wu

    Abstract: This paper describes the zero-shot spontaneous style TTS system for the ISCSLP 2024 Conversational Voice Clone Challenge (CoVoC). We propose a LLaMA-based codec language model with a delay pattern to achieve spontaneous style voice cloning. To improve speech intelligibility, we introduce the Classifier-Free Guidance (CFG) strategy in the language model to strengthen conditional guidance on token p… ▽ More

    Submitted 4 February, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

    Comments: Accepted by ISCSLP 2024

  32. arXiv:2411.18329  [pdf, other

    eess.SP cs.IT

    Two-Timescale Digital Twin Assisted Model Interference and Retraining over Wireless Network

    Authors: Jiayi Cong, Guoliang Cheng, Changsheng You, Xinyu Huang, Wen Wu

    Abstract: In this paper, we investigate a resource allocation and model retraining problem for dynamic wireless networks by utilizing incremental learning, in which the digital twin (DT) scheme is employed for decision making. A two-timescale framework is proposed for computation resource allocation, mobile user association, and incremental training of user models. To obtain an optimal resource allocation a… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: 6 pages, 4 figures

  33. arXiv:2411.18129  [pdf, other

    cs.NI eess.SP

    Edge-Assisted Accelerated Cooperative Sensing for CAVs: Task Placement and Resource Allocation

    Authors: Yuxuan Wang, Kaige Qu, Wen Wu, Xuemin, Shen

    Abstract: In this paper, we propose a novel road side unit (RSU)-assisted cooperative sensing scheme for connected autonomous vehicles (CAVs), with the objective to reduce completion time of sensing tasks. Specifically, LiDAR sensing data of both RSU and CAVs are selectively fused to improve sensing accuracy, and computing resources therein are cooperatively utilized to process tasks in real time. To this e… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  34. arXiv:2411.08538  [pdf

    physics.app-ph eess.SY

    Intelligent Adaptive Metasurface in Complex Wireless Environments

    Authors: Han Qing Yang, Jun Yan Dai, Hui Dong Li, Lijie Wu, Meng Zhen Zhang, Zi Hang Shen, Si Ran Wang, Zheng Xing Wang, Wankai Tang, Shi Jin, Jun Wei Wu, Qiang Cheng, Tie Jun Cui

    Abstract: The programmable metasurface is regarded as one of the most promising transformative technologies for next-generation wireless system applications. Due to the lack of effective perception ability of the external electromagnetic environment, there are numerous challenges in the intelligent regulation of wireless channels, and it still relies on external sensors to reshape electromagnetic environmen… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  35. arXiv:2410.14965  [pdf, other

    eess.IV cs.CV

    Non-Invasive to Invasive: Enhancing FFA Synthesis from CFP with a Benchmark Dataset and a Novel Network

    Authors: Hongqiu Wang, Zhaohu Xing, Weitong Wu, Yijun Yang, Qingqing Tang, Meixia Zhang, Yanwu Xu, Lei Zhu

    Abstract: Fundus imaging is a pivotal tool in ophthalmology, and different imaging modalities are characterized by their specific advantages. For example, Fundus Fluorescein Angiography (FFA) uniquely provides detailed insights into retinal vascular dynamics and pathology, surpassing Color Fundus Photographs (CFP) in detecting microvascular abnormalities and perfusion status. However, the conventional invas… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: ACMMM 24 MCHM

  36. arXiv:2410.14882  [pdf

    cs.AR eess.SP

    Multi-diseases detection with memristive system on chip

    Authors: Zihan Wang, Daniel W. Yang, Zerui Liu, Evan Yan, Heming Sun, Ning Ge, Miao Hu, Wei Wu

    Abstract: This study presents the first implementation of multilayer neural networks on a memristor/CMOS integrated system on chip (SoC) to simultaneously detect multiple diseases. To overcome limitations in medical data, generative AI techniques are used to enhance the dataset, improving the classifier's robustness and diversity. The system achieves notable performance with low latency, high accuracy (91.8… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 14 pages, 5 figures

    ACM Class: C.1.3; I.2.0

  37. arXiv:2410.06115  [pdf, other

    cs.IT eess.SP

    A physics-based perspective for understanding and utilizing spatial resources of wireless channels

    Authors: Hui Xu, Jun Wei Wu, Zhen Jie Qi, Hao Tian Wu, Rui Wen Shao, Qiang Cheng, Jieao Zhu, Linglong Dai, Tie Jun Cui

    Abstract: To satisfy the increasing demands for transmission rates of wireless communications, it is necessary to use spatial resources of electromagnetic (EM) waves. In this context, EM information theory (EIT) has become a hot topic by integrating the theoretical framework of deterministic mathematics and stochastic statistics to explore the transmission mechanisms of continuous EM waves. However, the pre… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 31pages, 8 figures

  38. arXiv:2410.01584  [pdf, other

    cs.NI eess.SY

    AI-Native Network Digital Twin for Intelligent Network Management in 6G

    Authors: Wen Wu, Xinyu Huang, Tom H. Luan

    Abstract: As a pivotal virtualization technology, network digital twin is expected to accurately reflect real-time status and abstract features in the on-going sixth generation (6G) networks. In this article, we propose an artificial intelligence (AI)-native network digital twin framework for 6G networks to enable the synergy of AI and network digital twin, thereby facilitating intelligent network managemen… ▽ More

    Submitted 9 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: This article is submitted to IEEE Wireless Communications

  39. arXiv:2410.01072  [pdf, other

    eess.IV cs.CV q-bio.QM

    Generating Seamless Virtual Immunohistochemical Whole Slide Images with Content and Color Consistency

    Authors: Sitong Liu, Kechun Liu, Samuel Margolis, Wenjun Wu, Stevan R. Knezevich, David E Elder, Megan M. Eguchi, Joann G Elmore, Linda Shapiro

    Abstract: Immunohistochemical (IHC) stains play a vital role in a pathologist's analysis of medical images, providing crucial diagnostic information for various diseases. Virtual staining from hematoxylin and eosin (H&E)-stained whole slide images (WSIs) allows the automatic production of other useful IHC stains without the expensive physical staining process. However, current virtual WSI generation methods… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  40. arXiv:2409.19882  [pdf, ps, other

    eess.SY math.NA math.OC

    Tannenbaum's gain-margin optimization meets Polyak's heavy-ball algorithm

    Authors: Wuwei Wu, Jie Chen, Mihailo R. Jovanović, Tryphon T. Georgiou

    Abstract: The paper highlights a relatively unknown link between algorithm design in optimization and control synthesis in robust control. Specifically, quadratic optimization can be recast as a regulation problem within the framework of $\mathcal{H}_\infty$ control. From this vantage point, the optimality of Polyak's fastest heavy-ball algorithm can be ascertained as a solution to a gain margin optimizatio… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: 25 pages, 8 figures

    MSC Class: 93B36; 93B52; 65-XX; 49Mxx; 49M15; 30E05

  41. arXiv:2409.17596  [pdf, other

    cs.MM cs.AI eess.IV

    Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming

    Authors: Zehao Zhu, Wei Sun, Jun Jia, Wei Wu, Sibin Deng, Kai Li, Ying Chen, Xiongkuo Min, Jia Wang, Guangtao Zhai

    Abstract: In recent years, live video streaming has gained widespread popularity across various social media platforms. Quality of experience (QoE), which reflects end-users' satisfaction and overall experience, plays a critical role for media service providers to optimize large-scale live compression and transmission strategies to achieve perceptually optimal rate-distortion trade-off. Although many QoE me… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 14 pages, 5 figures

  42. arXiv:2409.17352  [pdf, other

    cs.SI eess.SY

    On the Interplay of Clustering and Evolution in the Emergence of Epidemic Outbreaks

    Authors: Mansi Sood, Hejin Gu, Rashad Eletreby, Swarun Kumar, Chai Wah Wu, Osman Yagan

    Abstract: In an increasingly interconnected world, a key scientific challenge is to examine mechanisms that lead to the widespread propagation of contagions, such as misinformation and pathogens, and identify risk factors that can trigger large-scale outbreaks. Underlying both the spread of disease and misinformation epidemics is the evolution of the contagion as it propagates, leading to the emergence of d… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  43. arXiv:2409.08596  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions

    Authors: Lingwei Meng, Shujie Hu, Jiawen Kang, Zhaoqing Li, Yuejiao Wang, Wenxuan Wu, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: Recent advancements in large language models (LLMs) have revolutionized various domains, bringing significant progress and new opportunities. Despite progress in speech-related tasks, LLMs have not been sufficiently explored in multi-talker scenarios. In this work, we present a pioneering effort to investigate the capability of LLMs in transcribing speech in multi-talker environments, following ve… ▽ More

    Submitted 2 April, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted to IEEE ICASSP 2025. Update code link

  44. arXiv:2408.04358  [pdf, other

    eess.SY

    Goal-Oriented UAV Communication Design and Optimization for Target Tracking: A MachineLearning Approach

    Authors: Wenchao Wu, Yanning Wu, Yuanqing Yang, Yansha Deng

    Abstract: To accomplish various tasks, safe and smooth control of unmanned aerial vehicles (UAVs) needs to be guaranteed, which cannot be met by existing ultra-reliable low latency communications (URLLC). This has attracted the attention of the communication field, where most existing work mainly focused on optimizing communication performance (i.e., delay) and ignored the performance of the task (i.e., tra… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  45. arXiv:2407.21381  [pdf, other

    eess.IV cs.CV

    Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging

    Authors: Wenhua Wu, Kun Hu, Wenxi Yue, Wei Li, Milena Simic, Changyang Li, Wei Xiang, Zhiyong Wang

    Abstract: Knee osteoarthritis (KOA), a common form of arthritis that causes physical disability, has become increasingly prevalent in society. Employing computer-aided techniques to automatically assess the severity and progression of KOA can greatly benefit KOA treatment and disease management. Particularly, the advancement of X-ray technology in KOA demonstrates its potential for this purpose. Yet, existi… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  46. arXiv:2407.19220  [pdf

    physics.ed-ph eess.SY

    A Low-Frequency Vibration Experimental Platform for University Physics Experiment Designed by LabVIEW

    Authors: Yangjie Dai, Leijian Wang, Wenbin Wu, Aiping Chen, Dawei Gu

    Abstract: Virtual instrument technology has been increasingly used in university physics experiment teaching. An experimental platform is specifically constructed for studying low-frequency vibrations in university physics, which is based on a computer and its internal sound card, along with a program developed in LabVIEW programming environment to perform control and measurement on our experimental platfor… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 13 pages, 8 figures, 2 supplementary files

  47. Channel Estimation for Movable-Antenna MIMO Systems Via Tensor Decomposition

    Authors: Ruoyu Zhang, Lei Cheng, Wei Zhang, Xinrong Guan, Yueming Cai, Wen Wu, Rui Zhang

    Abstract: In this letter, we investigate the channel estimation problem for MIMO wireless communication systems with movable antennas (MAs) at both the transmitter (Tx) and receiver (Rx). To achieve high channel estimation accuracy with low pilot training overhead, we propose a tensor decomposition-based method for estimating the parameters of multi-path channel components, including their azimuth and eleva… ▽ More

    Submitted 6 January, 2025; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: 5 pages, 3 figures

    Journal ref: IEEE Wireless Communications Letters, vol. 13, no. 11, pp. 3089-3093, Nov. 2024

  48. arXiv:2407.11481  [pdf, other

    cs.LG cs.AI eess.SP

    Multi-Channel Masked Autoencoder and Comprehensive Evaluations for Reconstructing 12-Lead ECG from Arbitrary Single-Lead ECG

    Authors: Jiarong Chen, Wanqing Wu, Tong Liu, Shenda Hong

    Abstract: Electrocardiogram (ECG) has emerged as a widely accepted diagnostic instrument for cardiovascular diseases (CVD). The standard clinical 12-lead ECG configuration causes considerable inconvenience and discomfort, while wearable devices offers a more practical alternative. To reduce information gap between 12-lead ECG and single-lead ECG, this study proposes a multi-channel masked autoencoder (MCMA)… ▽ More

    Submitted 3 October, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: It is a revised version.The open-source code is publicly available at https://github.com/CHENJIAR3/MCMA

  49. arXiv:2406.19608  [pdf, other

    eess.SY

    Multi-service collaboration and composition of cloud manufacturing customized production based on problem decomposition

    Authors: Hao Yue, Yingtao Wu, Min Wang, Hesuan Hu, Weimin Wu, Jihui Zhang

    Abstract: Cloud manufacturing system is a service-oriented and knowledge-based one, which can provide solutions for the large-scale customized production. The service resource allocation is the primary factor that restricts the production time and cost in the cloud manufacturing customized production (CMCP). In order to improve the efficiency and reduce the cost in CMCP, we propose a new framework which con… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

    ACM Class: J.0

  50. Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

    Authors: Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

    Abstract: The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts. This paper studies the automatic detection of suicide risk based on spontaneous speech from adolescents, and collects a Mandarin dataset with 15 hours of suicide speech from more than a thousand adolescents aged from ten to eighteen for our experiments. To leverage the diverse… ▽ More

    Submitted 9 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024