Skip to main content

Showing 1–50 of 134 results for author: Deng, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.05504  [pdf, other

    eess.IV cs.CV

    Image Restoration via Multi-domain Learning

    Authors: Xingyu Jiang, Ning Gao, Xiuhui Zhang, Hongkun Dou, Shaowen Fu, Xiaoqing Zhong, Hongjue Li, Yue Deng

    Abstract: Due to adverse atmospheric and imaging conditions, natural images suffer from various degradation phenomena. Consequently, image restoration has emerged as a key solution and garnered substantial attention. Although recent Transformer architectures have demonstrated impressive success across various restoration tasks, their considerable model complexity poses significant challenges for both traini… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  2. arXiv:2505.02439  [pdf, ps, other

    cs.AI cs.LG eess.SY

    ReeM: Ensemble Building Thermodynamics Model for Efficient HVAC Control via Hierarchical Reinforcement Learning

    Authors: Yang Deng, Yaohui Liu, Rui Liang, Dafang Zhao, Donghua Xie, Ittetsu Taniguchi, Dan Wang

    Abstract: The building thermodynamics model, which predicts real-time indoor temperature changes under potential HVAC (Heating, Ventilation, and Air Conditioning) control operations, is crucial for optimizing HVAC control in buildings. While pioneering studies have attempted to develop such models for various building environments, these models often require extensive data collection periods and rely heavil… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  3. arXiv:2505.01655  [pdf, other

    eess.SP

    Understanding the Mechanisms Behind Structural Influences on Link Prediction: A Case Study on FB15k-237

    Authors: Xiaobo Jiang, Yadong Deng

    Abstract: FB15k-237 mitigates the data leakage issue by excluding inverse and symmetric relationship triples, however, this has led to substantial performance degradation and slow improvement progress. Traditional approaches demonstrate limited effectiveness on FB15k-237, primarily because the underlying mechanism by which structural features of the dataset influence model performance remains unexplored. To… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  4. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  5. arXiv:2504.10686  [pdf, other

    cs.CV eess.IV

    The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

  6. arXiv:2504.05729  [pdf, other

    cs.IT eess.SP eess.SY

    Robust and Efficient Average Consensus with Non-Coherent Over-the-Air Aggregation

    Authors: Yuhang Deng, Zheng Chen, Erik G. Larsson

    Abstract: Non-coherent over-the-air (OTA) computation has garnered increasing attention for its advantages in facilitating information aggregation among distributed agents in resource-constrained networks without requiring precise channel estimation. A promising application scenario of this method is distributed average consensus in wireless multi-agent systems. However, in such scenario, non-coherent inter… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 6 pages, 3 figures, accepted in IEEE ICC 2025

  7. arXiv:2503.17708  [pdf, other

    cs.NI eess.SP

    RAISE: Optimizing RIS Placement to Maximize Task Throughput in Multi-Server Vehicular Edge Computing

    Authors: Yanan Ma, Zhengru Fang, Longzhi Yuan, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Given the limited computing capabilities on autonomous vehicles, onboard processing of large volumes of latency-sensitive tasks presents significant challenges. While vehicular edge computing (VEC) has emerged as a solution, offloading data-intensive tasks to roadside servers or other vehicles is hindered by large obstacles like trucks/buses and the surge in service demands during rush hours. To a… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: 14 pages, 10 figures

  8. arXiv:2503.11229  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Exploring the Potential of Large Multimodal Models as Effective Alternatives for Pronunciation Assessment

    Authors: Ke Wang, Lei He, Kun Liu, Yan Deng, Wenning Wei, Sheng Zhao

    Abstract: Large Multimodal Models (LMMs) have demonstrated exceptional performance across a wide range of domains. This paper explores their potential in pronunciation assessment tasks, with a particular focus on evaluating the capabilities of the Generative Pre-trained Transformer (GPT) model, specifically GPT-4o. Our study investigates its ability to process speech and audio for pronunciation assessment a… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 pages

  9. Establishment and Solution of a Multi-Stage Decision Model Based on Hypothesis Testing and Dynamic Programming Algorithm

    Authors: Ziyang Liu, Yurui Hu, Yihan Deng

    Abstract: This paper introduces a novel multi-stage decision-making model that integrates hypothesis testing and dynamic programming algorithms to address complex decision-making scenarios.Initially,we develop a sampling inspection scheme that controls for both Type I and Type II errors using a simple random sampling method without replacement,ensuring the randomness and representativeness of the sample whi… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 7 pages ,2 figures ,published by ICIRDC 2024

    Journal ref: Proc. ICIRDC 2024, pp. 883-884, ISBN 979-8-3315-3405-9 (2024)

  10. arXiv:2502.20927  [pdf, other

    eess.IV

    Goal-Oriented Semantic Communication for Wireless Video Transmission via Generative AI

    Authors: Nan Li, Yansha Deng, Dusit Niyato

    Abstract: Efficient video transmission is essential for seamless communication and collaboration within the visually-driven digital landscape. To achieve low latency and high-quality video transmission over a bandwidth-constrained noisy wireless channel, we propose a stable diffusion (SD)-based goal-oriented semantic communication (GSC) framework. In this framework, we first design a semantic encoder that e… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: Submitted to IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2408.00428

  11. arXiv:2502.20668  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    OpenEarthSensing: Large-Scale Fine-Grained Benchmark for Open-World Remote Sensing

    Authors: Xiang Xiang, Zhuo Xu, Yao Deng, Qinhao Zhou, Yifan Liang, Ke Chen, Qingfang Zheng, Yaowei Wang, Xilin Chen, Wen Gao

    Abstract: In open-world remote sensing, deployed models must continuously adapt to a steady influx of new data, which often exhibits various shifts compared to what the model encountered during the training phase. To effectively handle the new data, models are required to detect semantic shifts, adapt to covariate shifts, and continuously update themselves. These challenges give rise to a variety of open-wo… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  12. arXiv:2501.06552  [pdf, other

    eess.SP cs.IT eess.SY

    When xURLLC Meets NOMA: A Stochastic Network Calculus Perspective

    Authors: Yuang Chen, Hancheng Lu, Langtin Qin, Yansha Deng, Arumugam Nallanathan

    Abstract: The advent of next-generation ultra-reliable and low-latency communications (xURLLC) presents stringent and unprecedented requirements for key performance indicators (KPIs). As a disruptive technology, non-orthogonal multiple access (NOMA) harbors the potential to fulfill these stringent KPIs essential for xURLLC. However, the immaturity of research on the tail distributions of these KPIs signific… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

    Comments: 7 pages, 5 figures, accepted by IEEE Communications Magazine

  13. arXiv:2412.12677  [pdf, ps, other

    eess.SP

    A Simplified Algorithm for Joint Real-Time Synchronization, NLoS Identification, and Multi-Agent Localization

    Authors: Yili Deng, Jie Fan, Jiguang He, Baojia Luo, Miaomiao Dong, Zhongyi Huang

    Abstract: Real-time, high-precision localization in large-scale wireless networks faces two primary challenges: clock offsets caused by network asynchrony and non-line-of-sight (NLoS) conditions. To tackle these challenges, we propose a low-complexity real-time algorithm for joint synchronization and NLoS identification-based localization. For precise synchronization, we resolve clock offsets based on accum… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  14. arXiv:2412.10985  [pdf, other

    eess.IV cs.CV

    MorphiNet: A Graph Subdivision Network for Adaptive Bi-ventricle Surface Reconstruction

    Authors: Yu Deng, Yiyang Xu, Linglong Qian, Charlene Mauger, Anastasia Nasopoulou, Steven Williams, Michelle Williams, Steven Niederer, David Newby, Andrew McCulloch, Jeff Omens, Kuberan Pushprajah, Alistair Young

    Abstract: Cardiac Magnetic Resonance (CMR) imaging is widely used for heart modelling and digital twin computational analysis due to its ability to visualize soft tissues and capture dynamic functions. However, the anisotropic nature of CMR images, characterized by large inter-slice distances and misalignments from cardiac motion, poses significant challenges to accurate model reconstruction. These limitati… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  15. arXiv:2412.06330  [pdf, ps, other

    eess.SP

    A reconfigurable calibration-free digital-to-time converter based on a high-speed transceiver

    Authors: Dexuan Kong, Zaiming Fu, Yujie Deng, Ruiqi Wang

    Abstract: This paper proposes a high-speed transceiver-based method for implementing a digital-to-time converter (DTC). A real-time decoding technique is introduced to inject time information into high-speed pattern data. The stability of the high-speed clock ensures the high precision of the synthesized timing signal without the need for calibration. The reconfigurability of the clock resources provides th… ▽ More

    Submitted 10 December, 2024; v1 submitted 9 December, 2024; originally announced December 2024.

  16. arXiv:2411.17552  [pdf, other

    eess.SY

    Ensuring Safety in Target Pursuit Control: A CBF-Safe Reinforcement Learning Approach

    Authors: Yaosheng Deng, Junjie Gao, Jiaping Xiao, Mir Feroskhan

    Abstract: This paper addresses the target-pursuit problem, aiming to ensure each pursuer's safety regarding collision avoidance, sensing range, and input saturation. An input-constrained CBF is proposed to dynamically regulate the pursuer's control, ensuring effective target pursuit even when the target performs evasive maneuvers. To further ensure safety, two sets of CBF constraints are designed to regulat… ▽ More

    Submitted 10 December, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

    Comments: 12 pages

  17. arXiv:2411.16187  [pdf, other

    eess.SY eess.SP

    Goal-oriented Semantic Communications for Metaverse Construction via Generative AI and Optimal Transport

    Authors: Zhe Wang, Nan Li, Yansha Deng, A. Hamid Aghvami

    Abstract: The emergence of the metaverse has boosted productivity and creativity, driving real-time updates and personalized content, which will substantially increase data traffic. However, current bit-oriented communication networks struggle to manage this high volume of dynamic information, restricting metaverse applications interactivity. To address this research gap, we propose a goal-oriented semantic… ▽ More

    Submitted 30 March, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

  18. arXiv:2411.13042  [pdf, other

    cs.CV eess.IV

    Attentive Contextual Attention for Cloud Removal

    Authors: Wenli Huang, Ye Deng, Yang Wu, Jinjun Wang

    Abstract: Cloud cover can significantly hinder the use of remote sensing images for Earth observation, prompting urgent advancements in cloud removal technology. Recently, deep learning strategies have shown strong potential in restoring cloud-obscured areas. These methods utilize convolution to extract intricate local features and attention mechanisms to gather long-range information, improving the overall… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: 13 pages, 7 figures

  19. arXiv:2411.08835  [pdf, other

    cs.RO eess.SY

    Goal-oriented Semantic Communication for Robot Arm Reconstruction in Digital Twin: Feature and Temporal Selections

    Authors: Shutong Chen, Emmanouil Spyrakos-Papastavridis, Yichao Jin, Yansha Deng

    Abstract: As one of the most promising technologies in industry, the Digital Twin (DT) facilitates real-time monitoring and predictive analysis for real-world systems by precisely reconstructing virtual replicas of physical entities. However, this reconstruction faces unprecedented challenges due to the everincreasing communication overhead, especially for digital robot arm reconstruction. To this end, we p… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: Submitted to IEEE for potential publication

  20. arXiv:2411.02452  [pdf, other

    cs.CV eess.IV

    Goal-Oriented Semantic Communication for Wireless Visual Question Answering

    Authors: Sige Liu, Nan Li, Yansha Deng, Tony Q. S. Quek

    Abstract: The rapid progress of artificial intelligence (AI) and computer vision (CV) has facilitated the development of computation-intensive applications like Visual Question Answering (VQA), which integrates visual perception and natural language processing to generate answers. To overcome the limitations of traditional VQA constrained by local computation resources, edge computing has been incorporated… ▽ More

    Submitted 27 November, 2024; v1 submitted 3 November, 2024; originally announced November 2024.

  21. arXiv:2410.19615  [pdf, other

    cs.RO eess.SY

    Equilibrium Adaptation-Based Control for Track Stand of Single-Track Two-Wheeled Robots

    Authors: Boyi Wang, Yang Deng, Feilong Jing, Yiyong Sun, Zhang Chen, Bin Liang

    Abstract: Stationary balance control is challenging for single-track two-wheeled (STTW) robots due to the lack of elegant balancing mechanisms and the conflict between the limited attraction domain and external disturbances. To address the absence of balancing mechanisms, we draw inspiration from cyclists and leverage the track stand maneuver, which relies solely on steering and rear-wheel actuation. To ach… ▽ More

    Submitted 7 November, 2024; v1 submitted 25 October, 2024; originally announced October 2024.

    Comments: 11 pages, 7 figures

  22. arXiv:2410.02592  [pdf, other

    cs.CV cs.AI cs.LG eess.SY

    IC3M: In-Car Multimodal Multi-object Monitoring for Abnormal Status of Both Driver and Passengers

    Authors: Zihan Fang, Zheng Lin, Senkang Hu, Hangcheng Cao, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Recently, in-car monitoring has emerged as a promising technology for detecting early-stage abnormal status of the driver and providing timely alerts to prevent traffic accidents. Although training models with multimodal data enhances the reliability of abnormal status detection, the scarcity of labeled data and the imbalance of class distribution impede the extraction of critical abnormal state f… ▽ More

    Submitted 21 November, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: 16 pages, 17 figures

  23. arXiv:2409.09214  [pdf, other

    cs.SD eess.AS

    Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

    Authors: Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang, Dongya Jia, Feihu La, Duc Le, Bochen Li, Chumin Li, Hui Li, Xingxing Li, Shouda Liu, Wei-Tsung Lu, Yiqing Lu, Andrew Shaw, Janne Spijkervet, Yakun Sun, Bo Wang, Ju-Chiang Wang , et al. (13 additional authors not shown)

    Abstract: We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and post-production editing. For controlled music generation, our system enables vocal music gene… ▽ More

    Submitted 19 September, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Seed-Music technical report, 20 pages, 5 figures

  24. Smart CSI Processing for Accruate Commodity WiFi-based Humidity Sensing

    Authors: Yirui Deng, Deepak Mishra, Shaghik Atakaramians, Aruna Seneviratne

    Abstract: Indoor humidity is a crucial factor affecting people's health and well-being. Wireless humidity sensing techniques are scalable and low-cost, making them a promising solution for measuring humidity in indoor environments without requiring additional devices. Such, machine learning (ML) assisted WiFi sensing is being envisioned as the key enabler for integrated sensing and communication (ISAC). How… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  25. arXiv:2409.00565  [pdf, other

    cs.LG cs.CV eess.SP

    Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging

    Authors: Yangfan Deng, Hamad Albidah, Ahmed Dallal, Jijun Yin, Zhi-Hong Mao

    Abstract: Sleep is crucial for human health, and EEG signals play a significant role in sleep research. Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges. To address these issues, we propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to impr… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  26. arXiv:2408.09534  [pdf, ps, other

    eess.SY

    Safe Adaptive Control for Uncertain Systems with Complex Input Constraints

    Authors: Yaosheng Deng, Yang Bai, Yujie Wang, Masaki Ogura, Mir Feroskhan

    Abstract: In this paper, we propose a novel adaptive Control Barrier Function (CBF) based controller for nonlinear systems with complex, time-varying input constraints. Conventional CBF approaches often struggle with feasibility issues and stringent assumptions when addressing input constraints. Unlike these methods, our approach converts the input-constraint problem into an output-constraint CBF design. Th… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures

  27. arXiv:2408.04358  [pdf, other

    eess.SY

    Goal-Oriented UAV Communication Design and Optimization for Target Tracking: A MachineLearning Approach

    Authors: Wenchao Wu, Yanning Wu, Yuanqing Yang, Yansha Deng

    Abstract: To accomplish various tasks, safe and smooth control of unmanned aerial vehicles (UAVs) needs to be guaranteed, which cannot be met by existing ultra-reliable low latency communications (URLLC). This has attracted the attention of the communication field, where most existing work mainly focused on optimizing communication performance (i.e., delay) and ignored the performance of the task (i.e., tra… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  28. arXiv:2408.03646  [pdf, other

    eess.SY

    Goal-oriented Semantic Communication for the Metaverse Application

    Authors: Zhe Wang, Nan Li, Yansha Deng

    Abstract: With the emergence of the metaverse and its role in enabling real-time simulation and analysis of real-world counterparts, an increasing number of personalized metaverse scenarios are being created to influence entertainment experiences and social behaviors. However, compared to traditional image and video entertainment applications, the exact transmission of the vast amount of metaverse-associate… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  29. arXiv:2408.00428  [pdf, other

    eess.IV

    Goal-Oriented Semantic Communication for Wireless Image Transmission via Stable Diffusion

    Authors: Nan Li, Yansha Deng

    Abstract: Efficient image transmission is essential for seamless communication and collaboration within the visually-driven digital landscape. To achieve low latency and high-quality image reconstruction over a bandwidth-constrained noisy wireless channel, we propose a stable diffusion (SD)-based goal-oriented semantic communication (GSC) framework. In this framework, we design a semantic autoencoder that e… ▽ More

    Submitted 28 February, 2025; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE ICC 2025

  30. arXiv:2408.00407  [pdf, other

    eess.SY

    Task-oriented and Semantics-aware Communications for Augmented Reality

    Authors: Zhe Wang, Yansha Deng

    Abstract: Upon the advent of the emerging metaverse and its related applications in Augmented Reality (AR), the current bit-oriented network struggles to support real-time changes for the vast amount of associated information, creating a significant bottleneck in its development. To address the above problem, we present a novel task-oriented and semantics-aware communication framework for augmented reality… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.15470

  31. arXiv:2407.14894  [pdf, other

    eess.SY

    A Holistic Optimization Framework for Energy Efficient UAV-assisted Fog Computing: Attitude Control, Trajectory Planning and Task Assignment

    Authors: Shuaijun Liu, Jinqiu Du, Yaxin Zheng, Jiaying Yin, Yuhui Deng, Jingjin Wu

    Abstract: Unmanned Aerial Vehicles (UAVs) have significantly enhanced fog computing by acting as both flexible computation platforms and communication mobile relays. In this paper, we propose a holistic framework that jointly optimizes the total latency and energy consumption for UAV-assisted fog computing in a three-dimensional spatial domain with varying terrain elevations and dynamic task generations. Ou… ▽ More

    Submitted 5 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures

  32. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  33. arXiv:2406.03714  [pdf, other

    cs.SD eess.AS

    Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

    Authors: Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current pr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  34. arXiv:2406.03706  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

    Authors: Jinlong Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

    Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  35. arXiv:2405.00603  [pdf, other

    cs.SD eess.AS

    Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

    Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issue… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  36. arXiv:2404.01654  [pdf, other

    cs.CV cs.AI eess.IV eess.SP

    AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease

    Authors: Xiang Xiang, Zihan Zhang, Jing Ma, Yao Deng

    Abstract: Parkinson's Disease (PD) is the second most common neurodegenerative disorder. The existing assessment method for PD is usually the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to assess the severity of various types of motor symptoms and disease progression. However, manual assessment suffers from high subjectivity, lack of consistency, and high cost and low ef… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Technical report for AI WALKUP, an APP winning 3rd Prize of 2022 HUST GS AI Innovation and Design Competition

  37. arXiv:2403.17392  [pdf, other

    cs.RO eess.SY nlin.AO

    Swarm navigation of cyborg-insects in unknown obstructed soft terrain

    Authors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki Ogura

    Abstract: Cyborg insects refer to hybrid robots that integrate living insects with miniature electronic controllers to enable robotic-like programmable control. These creatures exhibit advantages over conventional robots in adaption to complex terrain and sustained energy efficiency. Nevertheless, there is a lack of literature on the control of multi-cyborg systems. This research gap is due to the difficult… ▽ More

    Submitted 21 December, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  38. arXiv:2402.16027  [pdf, other

    cs.IT eess.SP

    Enhancing xURLLC with RSMA-Assisted Massive-MIMO Networks: Performance Analysis and Optimization

    Authors: Yuang Chen, Hancheng Lu, Chenwu Zhang, Yansha Deng, Arumugam Nallanathan

    Abstract: Massive interconnection has sparked people's envisioning for next-generation ultra-reliable and low-latency communications (xURLLC), prompting the design of customized next-generation advanced transceivers (NGAT). Rate-splitting multiple access (RSMA) has emerged as a pivotal technology for NGAT design, given its robustness to imperfect channel state information (CSI) and resilience to quality of… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: 14 pages, 11 figures, Submitted to IEEE for potential publication

  39. arXiv:2402.11478  [pdf, other

    eess.SY

    Federated Reinforcement Learning for Uplink Centric Broadband Communication Optimization over Unlicensed Spectrum

    Authors: Hui Zhou, Yansha Deng

    Abstract: To provide Uplink Centric Broadband Communication (UCBC), New Radio Unlicensed (NR-U) network has been standardized to exploit the unlicensed spectrum using Listen Before Talk (LBT) scheme to fairly coexist with the incumbent Wireless Fidelity (WiFi) network. Existing access schemes over unlicensed spectrum are required to perform Clear Channel Assessment (CCA) before transmissions, where fixed En… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  40. arXiv:2401.08096  [pdf, other

    cs.SD eess.AS

    Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

    Authors: Yimin Deng, Huaizhen Tang, Xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

    Abstract: Voice conversion refers to transferring speaker identity with well-preserved content. Better disentanglement of speech representations leads to better voice conversion. Recent studies have found that phonetic information from input audio has the potential ability to well represent content. Besides, the speaker-style modeling with pre-trained models making the process more complex. To tackle these… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)

  41. arXiv:2401.01544  [pdf, other

    cs.CV eess.SP

    Collaborative Perception for Connected and Autonomous Driving: Challenges, Possible Solutions and Opportunities

    Authors: Senkang Hu, Zhengru Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Autonomous driving has attracted significant attention from both academia and industries, which is expected to offer a safer and more efficient driving system. However, current autonomous driving systems are mostly based on a single vehicle, which has significant limitations which still poses threats to driving safety. Collaborative perception with connected and autonomous vehicles (CAVs) shows a… ▽ More

    Submitted 15 April, 2025; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE Wireless Communications Magazine (WCM)

  42. arXiv:2401.01044  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

    Authors: Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent advancements in diffusion models and large language models (LLMs) have significantly propelled the field of AIGC. Text-to-Audio (TTA), a burgeoning AIGC application designed to generate audio from natural language prompts, is attracting increasing attention. However, existing TTA studies often struggle with generation quality and text-audio alignment, especially for complex textual inputs.… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Demo and implementation at https://auffusion.github.io

  43. arXiv:2312.16383  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Frame-level emotional state alignment method for speech emotion recognition

    Authors: Qifei Li, Yingming Gao, Cong Wang, Yayue Deng, Jinlong Xue, Yichen Han, Ya Li

    Abstract: Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address th… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  44. Goal-oriented Semantic Communications for Robotic Waypoint Transmission: The Value and Age of Information Approach

    Authors: Wenchao Wu, Yuanqing Yang, Yansha Deng, A. Hamid Aghvami

    Abstract: The ultra-reliable and low-latency communication (URLLC) service of the fifth-generation (5G) mobile communication network struggles to support safe robot operation. Nowadays, the sixth-generation (6G) mobile communication network is proposed to provide hyper-reliable and low-latency communication to enable safer control for robots. However, current 5G/ 6G research mainly focused on improving comm… ▽ More

    Submitted 12 November, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: The paper has been accepted in IEEE TWC

  45. arXiv:2312.12358  [pdf, other

    cs.IT eess.SP

    Localization and Discrete Beamforming with a Large Reconfigurable Intelligent Surface

    Authors: Baojia Luo, Yili Deng, Miaomiao Dong, Zhongyi Huang, Xiang Chen, Wei Han, Bo Bai

    Abstract: In millimeter-wave (mmWave) cellular systems, reconfigurable intelligent surfaces (RISs) are foreseeably deployed with a large number of reflecting elements to achieve high beamforming gains. The large-sized RIS will make radio links fall in the near-field localization regime with spatial non-stationarity issues. Moreover, the discrete phase restriction on the RIS reflection coefficient incurs exp… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 13 pages

  46. arXiv:2311.08670  [pdf, other

    cs.SD eess.AS

    CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation

    Authors: Yimin Deng, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Better disentanglement of speech representation is essential to improve the quality of voice conversion. Recently contrastive learning is applied to voice conversion successfully based on speaker labels. However, the performance of model will reduce in conversion between similar speakers. Hence, we propose an augmented negative sample selection to address the issue. Specifically, we create hard ne… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by the 21st IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2023)

  47. arXiv:2310.07062  [pdf, other

    cs.SD cs.LG eess.AS

    Acoustic Model Fusion for End-to-end Speech Recognition

    Authors: Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng, Man-Hung Siu

    Abstract: Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  48. PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

    Authors: Yimin Deng, Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion as the style transfer task applied to speech, refers to converting one person's speech into a new speech that sounds like another person's. Up to now, there has been a lot of research devoted to better implementation of VC tasks. However, a good voice conversion model should not only match the timbre information of the target speaker, but also expressive information such as prosod… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted by the 31st ACM International Conference on Multimedia (MM2023)

  49. arXiv:2306.14228  [pdf, ps, other

    eess.SY eess.SP

    Task-Oriented Semantics-Aware Communication for Wireless UAV Control and Command Transmission

    Authors: Yujie Xu, Zhou Hui, Yansha Deng

    Abstract: To guarantee the safety and smooth control of Unmanned Aerial Vehicle (UAV) operation, the new control and command (C&C) data type imposes stringent quality of service (QoS) requirements on the cellular network. However, the existing bit-oriented communication framework is already approaching the Shannon capacity limit, which can hardly guarantee the ultra-reliable low latency communications (URLL… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

  50. DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences

    Authors: Wentao Liu, Tong Tian, Lemeng Wang, Weijin Xu, Lei Li, Haoyuan Li, Wenyi Zhao, Siyu Tian, Xipeng Pan, Huihua Yang, Feng Gao, Yiming Deng, Xin Yang, Ruisheng Su

    Abstract: The automated segmentation of Intracranial Arteries (IA) in Digital Subtraction Angiography (DSA) plays a crucial role in the quantification of vascular morphology, significantly contributing to computer-assisted stroke research and clinical practice. Current research primarily focuses on the segmentation of single-frame DSA using proprietary datasets. However, these methods face challenges due to… ▽ More

    Submitted 13 June, 2024; v1 submitted 21 June, 2023; originally announced June 2023.