Skip to main content

Showing 1–50 of 85 results for author: Wu, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.13293  [pdf

    eess.IV

    SUSEP-Net: Simulation-Supervised and Contrastive Learning-based Deep Neural Networks for Susceptibility Source Separation

    Authors: Min Li, Chen Chen, Zhenghao Li, Yin Liu, Shanshan Shan, Peng Wu, Pengfei Rong, Feng Liu, G. Bruce Pike, Alan H. Wilman, Hongfu Sun, Yang Gao

    Abstract: Quantitative susceptibility mapping (QSM) provides a valuable tool for quantifying susceptibility distributions in human brains; however, two types of opposing susceptibility sources (i.e., paramagnetic and diamagnetic), may coexist in a single voxel, and cancel each other out in net QSM images. Susceptibility source separation techniques enable the extraction of sub-voxel information from QSM map… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 8 figures, 2 tables

  2. arXiv:2506.11438  [pdf, ps, other

    eess.SP

    Movable-Antenna Array Enhanced Downlink NOMA

    Authors: Nianzu Li, Peiran Wu, Lipeng Zhu, Derrick Wing Kwan Ng

    Abstract: Movable antenna (MA) has gained increasing attention in the field of wireless communications due to its exceptional capability to proactively reconfigure wireless channels via localized antenna movements. In this paper, we investigate the resource allocation design for an MA array-enabled base station serving multiple single-antenna users in a downlink non-orthogonal multiple access (NOMA) system.… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Accepted in 2025 IEEE ICC Workshops

  3. arXiv:2506.00975  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction

    Authors: Qichao Wang, Ziqiao Meng, Wenqian Cui, Yifei Zhang, Pengcheng Wu, Bingzhe Wu, Irwin King, Liang Chen, Peilin Zhao

    Abstract: Inspired by the impressive capabilities of GPT-4o, there is growing interest in enabling speech language models (SLMs) to engage in natural, fluid spoken interactions with humans. Recent advancements have led to the development of several SLMs that demonstrate promising results in this area. However, current approaches have yet to fully exploit dual-channel speech data, which inherently captures t… ▽ More

    Submitted 11 June, 2025; v1 submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted by ICML 2025

  4. arXiv:2505.12089  [pdf, ps, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results

    Authors: Sangmin Lee, Eunpil Park, Angel Canelo, Hyunhee Park, Youngjo Kim, Hyung-Ju Chun, Xin Jin, Chongyi Li, Chun-Le Guo, Radu Timofte, Qi Wu, Tianheng Qiu, Yuchun Dong, Shenglin Ding, Guanghua Pan, Weiyu Zhou, Tao Hu, Yixu Feng, Duwei Dai, Yu Cao, Peng Wu, Wei Dong, Yanning Zhang, Qingsen Yan, Simon J. Larsen , et al. (11 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effect… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  5. arXiv:2503.02318  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

    Authors: Zhifei Xie, Mingbao Lin, Zihang Liu, Pengcheng Wu, Shuicheng Yan, Chunyan Miao

    Abstract: Recent advancements in multimodal reasoning have largely overlooked the audio modality. We introduce Audio-Reasoner, a large-scale audio language model for deep reasoning in audio tasks. We meticulously curated a large-scale and diverse multi-task audio dataset with simple annotations. Then, we leverage closed-source models to conduct secondary labeling, QA generation, along with structured COT pr… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Technical report, in process

  6. arXiv:2502.12817  [pdf, ps, other

    eess.SP cs.SD

    An Attention-Assisted Multi-Modal Data Fusion Model for Real-Time Estimation of Underwater Sound Velocity

    Authors: Pengfei Wu, Wei Huang, Yujie Shi, Hao Zhang

    Abstract: The estimation of underwater sound velocity distribution serves as a critical basis for facilitating effective underwater communication and precise positioning, given that variations in sound velocity influence the path of signal transmission. Conventional techniques for the direct measurement of sound velocity, as well as methods that involve the inversion of sound velocity utilizing acoustic fie… ▽ More

    Submitted 2 March, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  7. arXiv:2502.03974  [pdf

    eess.SY

    Spatiotemporal Trajectory Tracking Method for Vehicles Incorporating Lead-Lag Judgement

    Authors: Yuan Li, Xiang Dong, Tao Li, Junfeng Hao, Xiaoxue Xu, Sana Ullaha, Yincai Cai, Peng Wu, Ting Peng

    Abstract: In the domain of intelligent transportation systems, especially within the context of autonomous vehicle control, the preemptive holistic collaborative system has been presented as a promising solution to bring a remarkable enhancement in traffic efficiency and a substantial reduction in the accident rate, demonstrating a great potential of development. In order to ensure this system operates as i… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  8. arXiv:2501.15385  [pdf, other

    cs.CV eess.IV

    DDUNet: Dual Dynamic U-Net for Highly-Efficient Cloud Segmentation

    Authors: Yijie Li, Hewei Wang, Jinfeng Xu, Puzhen Wu, Yunzhong Xiao, Shaofan Wang, Soumyabrata Dev

    Abstract: Cloud segmentation amounts to separating cloud pixels from non-cloud pixels in an image. Current deep learning methods for cloud segmentation suffer from three issues. (a) Constrain on their receptive field due to the fixed size of the convolution kernel. (b) Lack of robustness towards different scenarios. (c) Requirement of a large number of parameters and limitations for real-time implementation… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: 5 pages

  9. arXiv:2501.07989  [pdf, ps, other

    eess.SP

    Movable Antenna Enhanced DF and AF Relaying Systems: Performance Analysis and Optimization

    Authors: Nianzu Li, Weidong Mei, Peiran Wu, Boyu Ning, Lipeng Zhu

    Abstract: Movable antenna (MA) has been deemed as a promising technology to flexibly reconfigure wireless channels by adjusting the antenna positions in a given local region. In this paper, we investigate the application of the MA technology in both decode-and-forward (DF) and amplify-and-forward (AF) relaying systems, where a relay is equipped with multiple MAs to assist in the data transmission between tw… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  10. arXiv:2412.13387  [pdf, other

    eess.AS cs.SD

    Deep Speech Synthesis from Multimodal Articulatory Representations

    Authors: Peter Wu, Bohan Yu, Kevin Scheck, Alan W Black, Aditi S. Krishnapriyan, Irene Y. Chen, Tanja Schultz, Shinji Watanabe, Gopala K. Anumanchipalli

    Abstract: The amount of articulatory data available for training deep learning models is much less compared to acoustic speech data. In order to improve articulatory-to-acoustic synthesis performance in these low-resource settings, we propose a multimodal pre-training framework. On single-speaker speech synthesis tasks from real-time magnetic resonance imaging and surface electromyography inputs, the intell… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  11. arXiv:2411.07603  [pdf, other

    quant-ph eess.SY

    $\mathscr{H}_2$ Model Reduction for Linear Quantum Systems

    Authors: G. P. Wu, S. Xue, G. F. Zhang, I. R. Petersen

    Abstract: In this paper, an $\mathscr{H}_2$ norm-based model reduction method for linear quantum systems is presented, which can obtain a physically realizable model with a reduced order for closely approximating the original system. The model reduction problem is described as an optimization problem, whose objective is taken as an $\mathscr{H}_2$ norm of the difference between the transfer function of the… ▽ More

    Submitted 19 November, 2024; v1 submitted 12 November, 2024; originally announced November 2024.

    Comments: 13 pages,3 figures

  12. arXiv:2411.06449  [pdf, other

    cs.CV eess.IV

    Improved Video VAE for Latent Video Diffusion Model

    Authors: Pingyu Wu, Kai Zhu, Yu Liu, Liming Zhao, Wei Zhai, Yang Cao, Zheng-Jun Zha

    Abstract: Variational Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora and other latent video diffusion generation models. While most of existing video VAEs inflate a pretrained image VAE into the 3D causal structure for temporal-spatial compression, this paper presents two astonishing findings: (1) The initialization from a well-tra… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

  13. Over-the-Air Computation via 2D Movable Antenna Array

    Authors: Nianzu Li, Peiran Wu, Boyu Ning, Lipeng Zhu, Weidong Mei

    Abstract: Movable antenna (MA) has emerged as a promising technology for improving the performance of wireless communication systems, which enables local movement of the antennas to create more favorable channel conditions. In this letter, we advance its application for over-the-air computation (AirComp) network, where an access point is equipped with a two-dimensional (2D) MA array to aggregate wireless da… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Journal ref: IEEE Wireless Communications Letters, vol. 14, no. 1, pp. 33-37, Jan. 2025

  14. arXiv:2409.02451  [pdf, other

    eess.AS cs.AI cs.SD

    Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP

    Authors: Yisi Liu, Bohan Yu, Drake Lin, Peter Wu, Cheol Jun Cho, Gopala Krishna Anumanchipalli

    Abstract: Articulatory trajectories like electromagnetic articulography (EMA) provide a low-dimensional representation of the vocal tract filter and have been used as natural, grounded features for speech synthesis. Differentiable digital signal processing (DDSP) is a parameter-efficient framework for audio synthesis. Therefore, integrating low-dimensional EMA features with DDSP can significantly enhance th… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: accepted for Spoken Language Technology Workshop 2024

  15. Enhancing Expressway Ramp Merge Safety and Efficiency via Spatiotemporal Cooperative Control

    Authors: Ting Peng, Xiaoxue Xu, Yuan Li, Jie WU, Tao Li, Xiang Dong, Yincai Cai, Peng Wu, Sana Ullah

    Abstract: In the context of autonomous driving on expressways, the issue of ensuring safe and efficient ramp merging remains a significant challenge. Existing systems often struggle to accurately assess the status and intentions of other vehicles, leading to a persistent occurrence of accidents despite efforts to maintain safe distances. This study proposes a novel spatiotemporal cooperative control approac… ▽ More

    Submitted 14 February, 2025; v1 submitted 15 August, 2024; originally announced August 2024.

    Journal ref: IEEE Access, vol. 13, pp. 25664-25682, 2025

  16. Sum Rate Maximization for Movable Antenna Enabled Uplink NOMA

    Authors: Nianzu Li, Peiran Wu, Boyu Ning, Lipeng Zhu

    Abstract: Movable antenna (MA) has been recently proposed as a promising candidate technology for the next generation wireless communication systems due to its significant capability of reconfiguring wireless channels via antenna movement. In this letter, we study an MA-enabled uplink non-orthogonal multiple access (NOMA) system, where each user is equipped with a single MA. Our objective is to maximize the… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 5 pages, 3 figures. Accepted to IEEE Wireless Communications Letters

    Journal ref: IEEE Wireless Communications Letters, vol. 13, no. 8, pp. 2140-2144, Aug. 2024

  17. arXiv:2408.05746  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna Enhanced AF Relaying: Two-Stage Antenna Position Optimization

    Authors: Nianzu Li, Weidong Mei, Boyu Ning, Peiran Wu

    Abstract: The movable antenna (MA) technology has attracted increasing attention in wireless communications due to its capability for flexibly adjusting the positions of multiple antennas in a local region to reconfigure channel conditions. In this paper, we investigate its application in an amplify-and-forward (AF) relay system, where a multi-MA AF relay is deployed to assist in the wireless communications… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  18. arXiv:2408.02934  [pdf, other

    cs.IT eess.SP

    Learned Trimmed-Ridge Regression for Channel Estimation in Millimeter-Wave Massive MIMO

    Authors: Pengxia Wu, Julian Cheng, Yonina C. Eldar, John M. Cioffi

    Abstract: Channel estimation poses significant challenges in millimeter-wave massive multiple-input multiple-output systems, especially when the base station has fewer radio-frequency chains than antennas. To address this challenge, one promising solution exploits the beamspace channel sparsity to reconstruct full-dimensional channels from incomplete measurements. This paper presents a model-based deep lear… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Communications

  19. arXiv:2407.21345  [pdf, other

    eess.AS

    Towards EMG-to-Speech with a Necklace Form Factor

    Authors: Peter Wu, Ryan Kaveh, Raghav Nautiyal, Christine Zhang, Albert Guo, Anvitha Kachinthaya, Tavish Mishra, Bohan Yu, Alan W Black, Rikky Muller, Gopala Krishna Anumanchipalli

    Abstract: Electrodes for decoding speech from electromyography (EMG) are typically placed on the face, requiring adhesives that are inconvenient and skin-irritating if used regularly. We explore a different device form factor, where dry electrodes are placed around the neck instead. 11-word, multi-speaker voiced EMG classifiers trained on data recorded with this device achieve 92.7% accuracy. Ablation studi… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  20. arXiv:2407.18627  [pdf, ps, other

    cs.LG eess.SP

    Multi-Agent Deep Reinforcement Learning for Energy Efficient Multi-Hop STAR-RIS-Assisted Transmissions

    Authors: Pei-Hsiang Liao, Li-Hsiang Shen, Po-Chen Wu, Kai-Ten Feng

    Abstract: Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) provides a promising way to expand coverage in wireless communications. However, limitation of single STAR-RIS inspire us to integrate the concept of multi-hop transmissions, as focused on RIS in existing research. Therefore, we propose the novel architecture of multi-hop STAR-RISs to achieve a wider range of… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted by Proc. IEEE VTC-fall

  21. arXiv:2407.17691  [pdf, other

    cs.NI eess.SY

    System-Level Simulation Framework for NB-IoT: Key Features and Performance Evaluation

    Authors: Shutao Zhang, Wenkun Wen, Peiran Wu, Hongqing Huang, Liya Zhu, Yijia Guo, Tingting Yang, Minghua Xia

    Abstract: Narrowband Internet of Things (NB-IoT) is a technology specifically designated by the 3rd Generation Partnership Project (3GPP) to meet the explosive demand for massive machine-type communications (mMTC), and it is evolving to RedCap. Industrial companies have increasingly adopted NB-IoT as the solution for mMTC due to its lightweight design and comprehensive technical specifications released by 3… ▽ More

    Submitted 13 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  22. arXiv:2406.15754  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    Multimodal Segmentation for Vocal Tract Modeling

    Authors: Rishi Jain, Bohan Yu, Peter Wu, Tejas Prabhune, Gopala Anumanchipalli

    Abstract: Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  23. arXiv:2406.12998  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Coding Speech through Vocal Tract Kinematics

    Authors: Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K. Anumanchipalli

    Abstract: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- Speech Articulatory Coding (SPARC). SPARC co… ▽ More

    Submitted 14 December, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 18, no. 8, pp. 1427-1440, Dec. 2024

  24. Optimal Reference Nodes Deployment for Positioning Seafloor Anchor Nodes

    Authors: Wei Huang, Pengfei Wu, Tianhe Xu, Hao Zhang, Kaitao Meng

    Abstract: Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Journal ref: IEEE Internet of Things Journal, 2024

  25. arXiv:2404.14132  [pdf, other

    cs.CV eess.IV

    CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

    Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

  26. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  27. arXiv:2312.15668  [pdf, ps, other

    cs.IT eess.SP

    Air-to-Ground Communications Beyond 5G: UAV Swarm Formation Control and Tracking

    Authors: Xiao Fan, Peiran Wu, Minghua Xia

    Abstract: Unmanned aerial vehicle (UAV) communications have been widely accepted as promising technologies to support air-to-ground communications in the forthcoming sixth-generation (6G) wireless networks. This paper proposes a novel air-to-ground communication model consisting of aerial base stations served by UAVs and terrestrial user equipments (UEs) by integrating the technique of coordinated multi-poi… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 14 pages, 9 figures, to appear in IEEE TWC

  28. arXiv:2312.12810  [pdf, other

    eess.AS cs.SD

    Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection

    Authors: Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli

    Abstract: Dysfluent speech modeling requires time-accurate and silence-aware transcription at both the word-level and phonetic-level. However, current research in dysfluency modeling primarily focuses on either transcription or detection, and the performance of each aspect remains limited. In this work, we present an unconstrained dysfluency modeling (UDM) approach that addresses both transcription and dete… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 2023 ASRU

  29. arXiv:2312.09034  [pdf, other

    eess.AS cs.SD eess.IV

    Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection

    Authors: Davide Berghi, Peipei Wu, Jinzheng Zhao, Wenwu Wang, Philip J. B. Jackson

    Abstract: Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation. SELD is usually tackled as an audio-only problem, but visual information has been recently included. Few audio-visual (AV)-SELD works have been published and most employ vision via face/object bounding boxes, or human pose keypoints. In contrast, we explore th… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  30. arXiv:2312.01566  [pdf, other

    physics.med-ph eess.IV

    Coronary Atherosclerotic Plaque Characterization with Photon-counting CT: a Simulation-based Feasibility Study

    Authors: Mengzhou Li, Mingye Wu, Jed Pack, Pengwei Wu, Bruno De Man, Adam Wang, Koen Nieman, Ge Wang

    Abstract: Recent development of photon-counting CT (PCCT) brings great opportunities for plaque characterization with much-improved spatial resolution and spectral imaging capability. While existing coronary plaque PCCT imaging results are based on detectors made of CZT or CdTe materials, deep-silicon photon-counting detectors have unique performance characteristics and promise distinct imaging capabilities… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: 13 figures, 5 tables

  31. arXiv:2311.09537  [pdf, other

    cs.SD eess.AS eess.SP

    Future Full-Ocean Deep SSPs Prediction based on Hierarchical Long Short-Term Memory Neural Networks

    Authors: Jiajun Lu, Hao Zhang, Pengfei Wu, Sijia Li, Wei Huang

    Abstract: The spatial-temporal distribution of underwater sound velocity affects the propagation mode of underwater acoustic signals. Therefore, rapid estimation and prediction of underwater sound velocity distribution is crucial for providing underwater positioning, navigation and timing (PNT) services. Currently, sound speed profile (SSP) inversion methods have a faster time response rate compared to dire… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.09522

  32. arXiv:2310.16287  [pdf, other

    cs.SD cs.GR eess.AS

    Towards Streaming Speech-to-Avatar Synthesis

    Authors: Tejas S. Prabhune, Peter Wu, Bohan Yu, Gopala K. Anumanchipalli

    Abstract: Streaming speech-to-avatar synthesis creates real-time animations for a virtual character from audio data. Accurate avatar representations of speech are important for the visualization of sound in linguistics, phonetics, and phonology, visual feedback to assist second language acquisition, and virtual embodiment for paralyzed patients. Previous works have highlighted the capability of deep articul… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  33. arXiv:2310.14778  [pdf, other

    cs.MM cs.SD eess.AS

    Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

    Authors: Jinzheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang

    Abstract: Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide applications. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter and deep learning-based methods can solve the problem of data association, audio-visual fusion and track ma… ▽ More

    Submitted 13 April, 2025; v1 submitted 23 October, 2023; originally announced October 2023.

  34. Underwater Sound Speed Profile Construction: A Review

    Authors: Wei Huang, Jixuan Zhou, Fan Gao, Jiajun Lu, Sijia Li, Pengfei Wu, Junting Wang, Hao Zhang, Tianhe Xu

    Abstract: Real--time and accurate construction of regional sound speed profiles (SSP) is important for building underwater positioning, navigation, and timing (PNT) systems as it greatly affect the signal propagation modes such as trajectory. In this paper, we summarizes and analyzes the current research status in the field of underwater SSP construction, and the mainstream methods include direct SSP measur… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Journal ref: Journal of Marine Science and Engineering, 2024

  35. arXiv:2310.02497  [pdf, other

    cs.SD cs.LG eess.AS

    Towards an Interpretable Representation of Speaker Identity via Perceptual Voice Qualities

    Authors: Robin Netzorg, Bohan Yu, Andrea Guzman, Peter Wu, Luna McNulty, Gopala Anumanchipalli

    Abstract: Unlike other data modalities such as text and vision, speech does not lend itself to easy interpretation. While lay people can understand how to describe an image or sentence via perception, non-expert descriptions of speech often end at high-level demographic information, such as gender or age. In this paper, we propose a possible interpretable representation of speaker identity based on perceptu… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  36. arXiv:2309.07861  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    CiwaGAN: Articulatory information exchange

    Authors: Gašper Beguš, Thomas Lu, Alan Zhou, Peter Wu, Gopala K. Anumanchipalli

    Abstract: Humans encode information into sounds by controlling articulators and decode information from sounds using the auditory apparatus. This paper introduces CiwaGAN, a model of human spoken language acquisition that combines unsupervised articulatory modeling with an unsupervised model of information exchange through the auditory modality. While prior research includes unsupervised articulatory modeli… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  37. arXiv:2308.05262  [pdf, other

    eess.SP

    Robust Interference Mitigation techniques for Direct Position Estimation

    Authors: Haoqing Li, Shuo Tang, Peng Wu, Pau Closas

    Abstract: Global Navigation Satellite System (GNSS) is pervasive in navigation and positioning applications, where precise position and time referencing estimations are required. Conventional methods for GNSS positioning involve a two-step process, where intermediate measurements such as Doppler shift and time delay of received GNSS signals are computed and then used to solve for the receiver's position. Al… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  38. arXiv:2308.03420  [pdf

    eess.SY

    A Safe DRL Method for Fast Solution of Real-Time Optimal Power Flow

    Authors: Pengfei Wu, Chen Chen, Dexiang Lai, Jian Zhong

    Abstract: High-level penetration of intermittent renewable energy sources (RESs) has introduced significant uncertainties into modern power systems. In order to rapidly and economically respond to the fluctuations of power system operating state, this paper proposes a safe deep reinforcement learning (SDRL) based method for fast solution of real-time optimal power flow (RT-OPF) problems. The proposed method… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  39. arXiv:2307.16096  [pdf, ps, other

    cs.IT eess.SP

    D-STAR: Dual Simultaneously Transmitting and Reflecting Reconfigurable Intelligent Surfaces for Joint Uplink/Downlink Transmission

    Authors: Li-Hsiang Shen, Po-Chen Wu, Chia-Jou Ku, Yu-Ting Li, Kai-Ten Feng, Yuanwei Liu, Lajos Hanzo

    Abstract: The joint uplink/downlink (JUD) design of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) is conceived in support of both uplink (UL) and downlink (DL) users. Furthermore, the dual STAR-RISs (D-STAR) concept is conceived as a promising architecture for 360-degree full-plane service coverage, including UL/DL users located between the base station (BS) and t… ▽ More

    Submitted 8 February, 2024; v1 submitted 29 July, 2023; originally announced July 2023.

    Comments: Accepted by IEEE TCOM

  40. arXiv:2307.02471  [pdf, other

    eess.AS

    Deep Speech Synthesis from MRI-Based Articulatory Representations

    Authors: Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W Black, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

    Abstract: In this paper, we study articulatory synthesis, a speech synthesis method using human vocal tract information that offers a way to develop efficient, generalizable and interpretable synthesizers. While recent advances have enabled intelligible articulatory synthesis using electromagnetic articulography (EMA), these methods lack critical articulatory information like excitation and nasality, limiti… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  41. arXiv:2306.13558  [pdf, other

    eess.SP

    One-Bit Spectrum Sensing for Cognitive Radio

    Authors: Pei-Wen Wu, Lei Huang, David Ramírez, Yu-Hang Xiao, Hing Cheung So

    Abstract: Spectrum sensing in cognitive radio necessitates effective monitoring of wide bandwidths, which requires high-rate sampling. Traditional spectrum sensing methods employing high-precision analog-to-digital converters (ADCs) result in increased power consumption and expensive hardware costs. In this paper, we explore blind spectrum sensing utilizing one-bit ADCs. We derive a closed-form detector bas… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  42. arXiv:2306.10359  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Text-Driven Foley Sound Generation With Latent Diffusion Model

    Authors: Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang

    Abstract: Foley sound generation aims to synthesise the background sound for multimedia content. Previous models usually employ a large development set with labels as input (e.g., single numbers or one-hot vector). In this work, we propose a diffusion model based system for Foley sound generation with text conditions. To alleviate the data scarcity issue, our model is initially pre-trained with large-scale… ▽ More

    Submitted 18 September, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

    Comments: Submit to DCASE-workshop 2023, an extension and supersedes the previous technical report arXiv:2305.15905

  43. arXiv:2305.17896  [pdf, other

    eess.SP

    Continuous and Noninvasive Measurement of Arterial Pulse Pressure and Pressure Waveform using an Image-free Ultrasound System

    Authors: Lirui Xu, Pang Wu, Pan Xia, Fanglin Geng, Peng Wang, Xianxiang Chen, Zhenfeng Li, Lidong Du, Shuping Liu, Li Li, Hongbo Chang, Zhen Fang

    Abstract: The local beat-to-beat local pulse pressure (PP) and blood pressure waveform of arteries, especially central arteries, are important indicators of the course of cardiovascular diseases (CVDs). Nevertheless, noninvasive measurement of them remains a challenge in the clinic. This work presents a three-element image-free ultrasound system with a low-computational method for real-time measurement of l… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 13 pages, 12 figures

  44. arXiv:2305.17499  [pdf, other

    cs.CL cs.MM eess.AS

    CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training

    Authors: Linhao Dong, Zhecheng An, Peihao Wu, Jun Zhang, Lu Lu, Zejun Ma

    Abstract: Speech or text representation generated by pre-trained models contains modal-specific information that could be combined for benefiting spoken language understanding (SLU) tasks. In this work, we propose a novel pre-training paradigm termed Continuous Integrate-and-Fire Pre-Training (CIF-PT). It relies on a simple but effective frame-to-token alignment: continuous integrate-and-fire (CIF) to bridg… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 Findings

  45. arXiv:2305.00383  [pdf, other

    cs.IT eess.SP

    Edge Learning for Large-Scale Internet of Things With Task-Oriented Efficient Communication

    Authors: Haihui Xie, Minghua Xia, Peiran Wu, Shuai Wang, H. Vincent Poor

    Abstract: In the Internet of Things (IoT) networks, edge learning for data-driven tasks provides intelligent applications and services. As the network size becomes large, different users may generate distinct datasets. Thus, to suit multiple edge learning tasks for large-scale IoT networks, this paper performs efficient communication under the task-oriented principle by using the collaborative design of wir… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 16 pages, 8 figures; accepted for publication in IEEE TWC

  46. arXiv:2302.06774  [pdf, other

    eess.AS cs.SD

    Speaker-Independent Acoustic-to-Articulatory Speech Inversion

    Authors: Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: To build speech processing methods that can handle speech as naturally as humans, researchers have explored multiple ways of building an invertible mapping from speech to an interpretable space. The articulatory space is a promising inversion target, since this space captures the mechanics of speech production. To this end, we build an acoustic-to-articulatory inversion (AAI) model that leverages… ▽ More

    Submitted 24 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  47. arXiv:2211.00968  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation

    Authors: Rao Ma, Xiaobo Wu, Jin Qiu, Yanan Qin, Haihua Xu, Peihao Wu, Zejun Ma

    Abstract: ASR model deployment environment is ever-changing, and the incoming speech can be switched across different domains during a session. This brings a challenge for effective domain adaptation when only target domain text data is available, and our objective is to obtain obviously improved performance on the target domain while the performance on the general domain is less undermined. In this paper,… ▽ More

    Submitted 2 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023

  48. arXiv:2210.15272  [pdf, ps, other

    eess.AS cs.SD eess.SP

    A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution

    Authors: Yisi Liu, Peter Wu, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Estimation of fundamental frequency (F0) in voiced segments of speech signals, also known as pitch tracking, plays a crucial role in pitch synchronous speech analysis, speech synthesis, and speech manipulation. In this paper, we capitalize on the high time and frequency resolution of the pseudo Wigner-Ville distribution (PWVD) and propose a new PWVD-based pitch estimation method. We devise an effi… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  49. arXiv:2210.15173  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Articulation GAN: Unsupervised modeling of articulatory learning

    Authors: Gašper Beguš, Alan Zhou, Peter Wu, Gopala K Anumanchipalli

    Abstract: Generative deep neural networks are widely used for speech synthesis, but most existing models directly generate waveforms or spectral outputs. Humans, however, produce speech by controlling articulators, which results in the production of speech sounds through physical properties of sound propagation. We introduce the Articulatory Generator to the Generative Adversarial Network paradigm, a new un… ▽ More

    Submitted 12 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

  50. Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

    Authors: Cheol Jun Cho, Peter Wu, Abdelrahman Mohamed, Gopala K. Anumanchipalli

    Abstract: Recent self-supervised learning (SSL) models have proven to learn rich representations of speech, which can readily be utilized by diverse downstream tasks. To understand such utilities, various analyses have been done for speech SSL models to reveal which and how information is encoded in the learned representations. Although the scope of previous analyses is extensive in acoustic, phonetic, and… ▽ More

    Submitted 20 July, 2023; v1 submitted 21 October, 2022; originally announced October 2022.