Skip to main content

Showing 1–50 of 145 results for author: Yao, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.00660  [pdf, ps, other

    eess.IV cs.AI cs.CV

    MTCNet: Motion and Topology Consistency Guided Learning for Mitral Valve Segmentationin 4D Ultrasound

    Authors: Rusi Chen, Yuanting Yang, Jiezhi Yao, Hongning Song, Ji Zhang, Yongsong Zhou, Yuhao Huang, Ronghao Yang, Dan Jia, Yuhan Zhang, Xing Tao, Haoran Dou, Qing Zhou, Xin Yang, Dong Ni

    Abstract: Mitral regurgitation is one of the most prevalent cardiac disorders. Four-dimensional (4D) ultrasound has emerged as the primary imaging modality for assessing dynamic valvular morphology. However, 4D mitral valve (MV) analysis remains challenging due to limited phase annotations, severe motion artifacts, and poor imaging quality. Yet, the absence of inter-phase dependency in existing methods hind… ▽ More

    Submitted 3 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted by MICCAI 2025

  2. arXiv:2507.00185  [pdf

    eess.IV cs.AI cs.CV

    Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)

    Authors: Yang Zhou, Chrystie Wan Ning Quek, Jun Zhou, Yan Wang, Yang Bai, Yuhe Ke, Jie Yao, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Lionel Tim-Ee Cheng, Tran Nguyen Tuan Anh, Chee Leong Cheng, Tien Yin Wong, Nan Liu, Iain Beehuat Tan, Tony Kiat Hon Lim, Rick Siow Mong Goh, Yong Liu, Daniel Shu Wei Ting

    Abstract: Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundatio… ▽ More

    Submitted 30 June, 2025; originally announced July 2025.

    Comments: 42 pages, 3 composite figures, 4 tables

  3. arXiv:2506.23986  [pdf, ps, other

    cs.SD eess.AS

    StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding

    Authors: Dake Guo, Jixun Yao, Linhan Ma, He Wang, Lei Xie

    Abstract: Recent advancements in discrete token-based speech generation have highlighted the importance of token-to-waveform generation for audio quality, particularly in real-time interactions. Traditional frameworks integrating semantic tokens with flow matching (FM) struggle with streaming capabilities due to their reliance on a global receptive field. Additionally, directly implementing token-by-token s… ▽ More

    Submitted 1 July, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

  4. arXiv:2506.12325  [pdf, ps, other

    cs.SD cs.CL eess.AS

    GSDNet: Revisiting Incomplete Multimodal-Diffusion from Graph Spectrum Perspective for Conversation Emotion Recognition

    Authors: Yuntao Shou, Jun Yao, Tao Meng, Wei Ai, Cen Chen, Keqin Li

    Abstract: Multimodal emotion recognition in conversations (MERC) aims to infer the speaker's emotional state by analyzing utterance information from multiple sources (i.e., video, audio, and text). Compared with unimodality, a more robust utterance representation can be obtained by fusing complementary semantic information from different modalities. However, the modality missing problem severely limits the… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  5. arXiv:2506.01023  [pdf, ps, other

    cs.SD cs.AI eess.AS

    A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement

    Authors: Shenghui Lu, Hukai Huang, Jinanglong Yao, Kaidi Wang, Qingyang Hong, Lin Li

    Abstract: This paper proposes a model that integrates sub-band processing and deep filtering to fully exploit information from the target time-frequency (TF) bin and its surrounding TF bins for single-channel speech enhancement. The sub-band module captures surrounding frequency bin information at the input, while the deep filtering module applies filtering at the output to both the target TF bin and its su… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 5 pages, 2 figure, accepted by Interspeech 2025

  6. Blind Passive Beamforming for MIMO System

    Authors: Wenhai Lai, Jiawei Yao, Kaiming Shen

    Abstract: Passive beamforming for the intelligent surface (IS)-aided multiple-input multiple-output (MIMO) communication is a difficult nonconvex problem. It becomes even more challenging under the practical discrete constraints on phase shifts. Unlike most of the existing approaches that rely on the channel state information (CSI), this work advocates a blind beamforming strategy without any CSI. Simply pu… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 6 pages

    Journal ref: IEEE Wireless Communications Letters 2025

  7. arXiv:2505.15004  [pdf, ps, other

    eess.AS cs.SD

    EASY: Emotion-aware Speaker Anonymization via Factorized Distillation

    Authors: Jixun Yao, Hexin Liu, Eng Siong Chng, Lei Xie

    Abstract: Emotion plays a significant role in speech interaction, conveyed through tone, pitch, and rhythm, enabling the expression of feelings and intentions beyond words to create a more personalized experience. However, most existing speaker anonymization systems employ parallel disentanglement methods, which only separate speech into linguistic content and speaker identity, often neglecting the preserva… ▽ More

    Submitted 30 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by INTERSPEECH 2025

  8. arXiv:2505.13805  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech

    Authors: Yu Pan, Yanni Hu, Yuguang Yang, Jixun Yao, Jianhao Ye, Hongbin Zhou, Lei Ma, Jianjun Zhao

    Abstract: Despite great advances, achieving high-fidelity emotional voice conversion (EVC) with flexible and interpretable control remains challenging. This paper introduces ClapFM-EVC, a novel EVC framework capable of generating high-quality converted speech driven by natural language prompts or reference speech with adjustable emotion intensity. We first propose EVC-CLAP, an emotional contrastive language… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by InterSpeech 2025

  9. arXiv:2505.10793  [pdf, ps, other

    eess.AS

    SongEval: A Benchmark Dataset for Song Aesthetics Evaluation

    Authors: Jixun Yao, Guobin Ma, Huixin Xue, Huakang Chen, Chunbo Hao, Yuepeng Jiang, Haohe Liu, Ruibin Yuan, Jin Xu, Wei Xue, Hao Liu, Lei Xie

    Abstract: Aesthetics serve as an implicit and important criterion in song generation tasks that reflect human perception beyond objective metrics. However, evaluating the aesthetics of generated songs remains a fundamental challenge, as the appreciation of music is highly subjective. Existing evaluation metrics, such as embedding-based distances, are limited in reflecting the subjective and perceptual aspec… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  10. arXiv:2505.09141  [pdf, ps, other

    eess.SP

    Sensing-Assisted Channel Prediction in Complex Wireless Environments: An LLM-Based Approach

    Authors: Junjie He, Zixiang Ren, Jianping Yao, Han Hu, Tony Xiao Han, Jie Xu

    Abstract: This letter studies the sensing-assisted channel prediction for a multi-antenna orthogonal frequency division multiplexing (OFDM) system operating in realistic and complex wireless environments. In this system,an integrated sensing and communication (ISAC) transmitter leverages the mono-static sensing capability to facilitate the prediction of its bi-static communication channel, by exploiting the… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  11. arXiv:2503.19368   

    eess.SP

    RIS-Assisted Passive Localization (RAPL): An Efficient Zero-Overhead Framework Using Conditional Sample Mean

    Authors: Jiawei Yao, Yijie Mao, Mingzhe Chen, Ye Hu

    Abstract: Reconfigurable Intelligent Surface (RIS) has been recognized as a promising solution for enhancing localization accuracy. Traditional RIS-based localization methods typically rely on prior channel knowledge, beam scanning, and pilot-based assistance. These approaches often result in substantial energy and computational overhead, and require real-time coordination between the base station (BS) and… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

  12. arXiv:2503.18610   

    eess.SP cs.IT

    RIS-Assisted Localization: A Novel Conditional Sample Mean Approach without CSI

    Authors: Jiawei Yao, Yijie Mao, Mingzhe Chen

    Abstract: Reconfigurable intelligent surface (RIS) has been recognized as a promising solution for enhancing localization accuracy. Traditional RIS-based localization methods typically rely on prior channel knowledge, beam scanning, and pilot-based assistance. These approaches often result in substantial energy and computational overhead, and require real-time coordination between the base station (BS) and… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

  13. arXiv:2503.17649  [pdf, ps, other

    cs.IT eess.SP

    Quantized Analog Beamforming Enabled Multi-task Federated Learning Over-the-air

    Authors: Jiacheng Yao, Wei Xu, Guangxu Zhu, Zhaohui Yang, Kaibin Huang, Dusit Niyato

    Abstract: Over-the-air computation (AirComp) has recently emerged as a pivotal technique for communication-efficient federated learning (FL) in resource-constrained wireless networks. Though AirComp leverages the superposition property of multiple access channels for computation, it inherently limits its ability to manage inter-task interference in multi-task computing. In this paper, we propose a quantized… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: Accepted by IEEE VTC-Spring 2025

  14. arXiv:2503.03560  [pdf, ps, other

    cs.IT eess.SP

    Optimal Beamforming for Multi-Target Multi-User ISAC Exploiting Prior Information: How Many Sensing Beams Are Needed?

    Authors: Jiayi Yao, Shuowen Zhang

    Abstract: This paper studies a multi-target multi-user integrated sensing and communication (ISAC) system where a multi-antenna base station (BS) communicates with multiple single-antenna users in the downlink and senses the unknown and random angle information of multiple targets based on their reflected echo signals at the BS receiver as well as their prior probability information. We focus on a general b… ▽ More

    Submitted 28 June, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: This is the longer version of a paper submitted for possible journal publication

  15. arXiv:2503.01183  [pdf, other

    eess.AS

    DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

    Authors: Ziqian Ning, Huakang Chen, Yuepeng Jiang, Chunbo Hao, Guobin Ma, Shuai Wang, Jixun Yao, Lei Xie

    Abstract: Recent advancements in music generation have garnered significant attention, yet existing approaches face critical limitations. Some current generative models can only synthesize either the vocal track or the accompaniment track. While some models can generate combined vocal and accompaniment, they typically rely on meticulously designed multi-stage cascading architectures and intricate data pipel… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  16. arXiv:2503.00298  [pdf, other

    cs.IT eess.SP

    Energy-Efficient Edge Inference in Integrated Sensing, Communication, and Computation Networks

    Authors: Jiacheng Yao, Wei Xu, Guangxu Zhu, Kaibin Huang, Shuguang Cui

    Abstract: Task-oriented integrated sensing, communication, and computation (ISCC) is a key technology for achieving low-latency edge inference and enabling efficient implementation of artificial intelligence (AI) in industrial cyber-physical systems (ICPS). However, the constrained energy supply at edge devices has emerged as a critical bottleneck. In this paper, we propose a novel energy-efficient ISCC fra… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: Accepted by IEEE JSAC

  17. arXiv:2502.02950  [pdf, other

    eess.AS cs.SD

    Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech

    Authors: Jixun Yao, Yuguang Yang, Yu Pan, Yuan Feng, Ziqian Ning, Jianhao Ye, Hongbin Zhou, Lei Xie

    Abstract: Integrating human feedback to align text-to-speech (TTS) system outputs with human preferences has proven to be an effective approach for enhancing the robustness of language model-based TTS systems. Current approaches primarily focus on using preference data annotated at the utterance level. However, frequent issues that affect the listening experience often only arise in specific segments of aud… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: WIP

  18. arXiv:2502.02942  [pdf, other

    eess.AS cs.SD

    GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

    Authors: Jixun Yao, Hexin Liu, Chen Chen, Yuchen Hu, EngSiong Chng, Lei Xie

    Abstract: Semantic information refers to the meaning conveyed through words, phrases, and contextual relationships within a given linguistic structure. Humans can leverage semantic information, such as familiar linguistic patterns and contextual cues, to reconstruct incomplete or masked speech signals in noisy environments. However, existing speech enhancement (SE) approaches often overlook the rich semanti… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: Accepted by ICLR 2025

  19. Combating Interference for Over-the-Air Federated Learning: A Statistical Approach via RIS

    Authors: Wei Shi, Jiacheng Yao, Wei Xu, Jindan Xu, Xiaohu You, Yonina C. Eldar, Chunming Zhao

    Abstract: Over-the-air computation (AirComp) integrates analog communication with task-oriented computation, serving as a key enabling technique for communication-efficient federated learning (FL) over wireless networks. However, owing to its analog characteristics, AirComp-enabled FL (AirFL) is vulnerable to both unintentional and intentional interference. In this paper, we aim to attain robustness in AirC… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: Accepted by IEEE Transactions on Signal Processing

  20. arXiv:2501.05127  [pdf, other

    cs.SD eess.AS

    DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification

    Authors: Qing Wang, Jixun Yao, Zhaokai Sun, Pengcheng Guo, Lei Xie, John H. L. Hansen

    Abstract: Being a form of biometric identification, the security of the speaker identification (SID) system is of utmost importance. To better understand the robustness of SID systems, we aim to perform more realistic attacks in SID, which are challenging for both humans and machines to detect. In this study, we propose DiffAttack, a novel timbre-reserved adversarial attack approach that exploits the capabi… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: 5 pages,4 figures, accepted by ICASSP 2025

  21. arXiv:2501.01281  [pdf, other

    eess.SP

    Towards Intelligent Antenna Positioning: Leveraging DRL for FAS-Aided ISAC Systems

    Authors: Shunxing Yang, Junteng Yao, Jie Tang, Tuo Wu, Maged Elkashlan, Chau Yuen, Merouane Debbah, Hyundong Shin, Matthew Valenti

    Abstract: Fluid antenna systems (FAS) enable dynamic antenna positioning, offering new opportunities to enhance integrated sensing and communication (ISAC) performance. However, existing studies primarily focus on communication enhancement or single-target sensing, leaving multi-target scenarios underexplored. Additionally, the joint optimization of beamforming and antenna positions poses a highly non-conve… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  22. arXiv:2412.19748  [pdf, ps, other

    cs.IT eess.SP

    UAV-Enabled Secure ISAC Against Dual Eavesdropping Threats: Joint Beamforming and Trajectory Design

    Authors: Jianping Yao, Zeyu Yang, Zai Yang, Jie Xu, Tony Q. S. Quek

    Abstract: In this work, we study an unmanned aerial vehicle (UAV)-enabled secure integrated sensing and communication (ISAC) system, where a UAV serves as an aerial base station (BS) to simultaneously perform communication with a user and detect a target on the ground, while a dual-functional eavesdropper attempts to intercept the signals for both sensing and communication. Facing the dual eavesdropping thr… ▽ More

    Submitted 27 May, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

    Comments: 8 pages, 6 figures, submitted for possible publication. It overlaps with the former version (arXiv:2412.19748)

  23. arXiv:2412.15843  [pdf, other

    eess.SP

    Rethinking Hardware Impairments in Multi-User Systems: Can FAS Make a Difference?

    Authors: Junteng Yao, Tuo Wu, Liaoshi Zhou, Ming Jin, Cunhua Pan, Maged Elkashlan, Fumiyuki Adachi, George K. Karagiannidis, Naofal Al-Dhahir, Chau Yuen

    Abstract: In this paper, we analyze the role of fluid antenna systems (FAS) in multi-user systems with hardware impairments (HIs). Specifically, we investigate a scenario where a base station (BS) equipped with multiple fluid antennas communicates with multiple users (CUs), each equipped with a single fluid antenna. Our objective is to maximize the minimum communication rate among all users by jointly optim… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  24. arXiv:2412.10844  [pdf, other

    eess.SY

    Lyapunov-based reinforcement learning for distributed control with stability guarantee

    Authors: Jingshi Yao, Minghao Han, Xunyuan Yin

    Abstract: In this paper, we propose a Lyapunov-based reinforcement learning method for distributed control of nonlinear systems comprising interacting subsystems with guaranteed closed-loop stability. Specifically, we conduct a detailed stability analysis and derive sufficient conditions that ensure closed-loop stability under a model-free distributed control scheme based on the Lyapunov theorem. The Lyapun… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: 28 pages, 10 figures, journal, Computers and Chemical Engineering

  25. arXiv:2412.04724  [pdf, other

    eess.AS cs.SD

    StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

    Authors: Jixun Yao, Yuguang Yang, Yu Pan, Ziqian Ning, Jiaohao Ye, Hongbin Zhou, Lei Xie

    Abstract: Zero-shot voice conversion (VC) aims to transfer the timbre from the source speaker to an arbitrary unseen speaker while preserving the original linguistic content. Despite recent advancements in zero-shot VC using language model-based or diffusion-based approaches, several challenges remain: 1) current approaches primarily focus on adapting timbre from unseen speakers and are unable to transfer s… ▽ More

    Submitted 10 December, 2024; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  26. arXiv:2412.03839  [pdf, other

    eess.SP

    Fluid Antenna Systems Enabling 6G:Principles, Applications, and Research Directions

    Authors: Tuo Wu, Kangda Zhi, Junteng Yao, Xiazhi Lai, Jianchao Zheng, Hong Niu, Maged Elkashlan, Kai-Kit Wong, Chan-Byoung Chae, Zhiguo Ding, George K. Karagiannidis, Merouane Debbah, Chau Yuen

    Abstract: Fluid antenna system (FAS) as a new version of reconfigurable antenna technologies promoting shape and position flexibility, has emerged as an exciting and possibly transformative technology for wireless communications systems. FAS represents any software-controlled fluidic, conductive or dielectric structure that can dynamically alter antenna's shape and position to change the gain, the radiation… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  27. arXiv:2411.18918  [pdf, other

    cs.SD eess.AS

    CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion

    Authors: Yuke Li, Xinfa Zhu, Hanzhao Li, JiXun Yao, WenJie Tian, XiPeng Yang, YunLin Chen, Zhifei Li, Lei Xie

    Abstract: Zero-shot voice conversion (VC) aims to convert the original speaker's timbre to any target speaker while keeping the linguistic content. Current mainstream zero-shot voice conversion approaches depend on pre-trained recognition models to disentangle linguistic content and speaker representation. This results in a timbre residue within the decoupled linguistic content and inadequacies in speaker r… ▽ More

    Submitted 3 December, 2024; v1 submitted 28 November, 2024; originally announced November 2024.

  28. arXiv:2411.09235  [pdf, ps, other

    eess.SP

    FAS for Secure and Covert Communications

    Authors: Junteng Yao, Liangxiao Xin, Tuo Wu, Ming Jin, Kai-Kit Wong, Chau Yuen, Hyundong Shin

    Abstract: This letter considers a fluid antenna system (FAS)-aided secure and covert communication system, where the transmitter adjusts multiple fluid antennas' positions to achieve secure and covert transmission under the threat of an eavesdropper and the detection of a warden. This letter aims to maximize the secrecy rate while satisfying the covertness constraint. Unfortunately, the optimization problem… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

  29. arXiv:2411.08386  [pdf, ps, other

    eess.SP

    A Secure Beamforming Design: When Fluid Antenna Meets NOMA

    Authors: Lifeng Mai, Junteng Yao, Jie Tang, Tuo Wu, Kai-Kit Wong, Hyundong Shin, Fumiyuki Adachi

    Abstract: This letter proposes a secure beamforming design for downlink non-orthogonal multiple access (NOMA) systems utilizing fluid antenna systems (FAS). We consider a setup where a base station (BS) with $M$ fluid antennas (FAs) communicates to a cell-center user (CU) and a cell-edge user (CEU), each with a FA. The CU is the intended recipient while the CEU is regarded as a potential eavesdropper. Our a… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  30. arXiv:2411.08383  [pdf, other

    eess.SP

    FAS-Driven Spectrum Sensing for Cognitive Radio Networks

    Authors: Junteng Yao, Ming Jin, Tuo Wu, Maged Elkashlan, Chau Yuen, Kai-Kit Wong, George K. Karagiannidis, Hyundong Shin

    Abstract: Cognitive radio (CR) networks face significant challenges in spectrum sensing, especially under spectrum scarcity. Fluid antenna systems (FAS) can offer an unorthodox solution due to their ability to dynamically adjust antenna positions for improved channel gain. In this letter, we study a FAS-driven CR setup where a secondary user (SU) adjusts the positions of fluid antennas to detect signals fro… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

  31. arXiv:2411.02026  [pdf, other

    cs.SD cs.AI eess.AS

    CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching

    Authors: Yu Pan, Yuguang Yang, Jixun Yao, Jianhao Ye, Hongbin Zhou, Lei Ma, Jianjun Zhao

    Abstract: Zero-shot voice conversion (VC) aims to transform the timbre of a source speaker into any previously unseen target speaker, while preserving the original linguistic content. Despite notable progress, attaining a degree of speaker similarity and naturalness on par with ground truth recordings continues to pose great challenge. In this paper, we propose CTEFM-VC, a zero-shot VC framework that levera… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: Work in progress; 5 pages;

  32. arXiv:2411.01398  [pdf, ps, other

    eess.SP

    Paving the Way to 6G: Outage Probability Analysis for FAS-ARIS Systems

    Authors: Jianchao Zheng, Xiazhi Lai, Junteng Yao, Jie Tang, Yijin Pan, Tuo Wu, Chau Yuen

    Abstract: In this paper, we pave the way to six-generation (6G) by investigating the outage probability (OP) of fluid antenna system (FAS)-active reconfigurable intelligent surface (ARIS) communication systems. We consider a FAS-ARIS setup consisting of a base station (BS) with a single fixed-position antenna and a receiver equipped with a fluid antenna (FA). Utilizing the block-correlation model, we derive… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

  33. arXiv:2410.23815  [pdf, other

    cs.SD cs.AI eess.AS

    The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge

    Authors: Dake Guo, Jixun Yao, Xinfa Zhu, Kangxiang Xia, Zhao Guo, Ziyu Zhang, Yao Wang, Jie Liu, Lei Xie

    Abstract: This paper presents the NPU-HWC system submitted to the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC). Our system consists of two modules: a speech generator for Track 1 and a background audio generator for Track 2. In Track 1, we employ Single-Codec to tokenize the speech into discrete tokens and use a language-model-based approach to achieve zero-shot speaking… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: accepted by ISCSLP 2024

  34. arXiv:2410.17609  [pdf, other

    eess.SP

    Exploring the Impact of RIS on Cooperative NOMA URLLC Systems: A Theoretical Perspective

    Authors: Jianchao Zheng, Tuo Wu, Junteng Yao, Chau Yuen, Zhiguo Ding, Fumiyuki Adachi

    Abstract: In this paper, we conduct a theoretical analysis of how to integrate reconfigurable intelligent surfaces (RIS) with cooperative non-orthogonal multiple access (NOMA), considering URLLC. We consider a downlink two-user cooperative NOMA system employing short-packet communications, where the two users are denoted by the central user (CU) and the cell-edge user (CEU), respectively, and an RIS is depl… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  35. NTU-NPU System for Voice Privacy 2024 Challenge

    Authors: Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, Eng Siong Chng

    Abstract: In this work, we describe our submissions for the Voice Privacy Challenge 2024. Rather than proposing a novel speech anonymization system, we enhance the provided baselines to meet all required conditions and improve evaluated metrics. Specifically, we implement emotion embedding and experiment with WavLM and ECAPA2 speaker embedders for the B3 baseline. Additionally, we compare different speaker… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: System description for VPC 2024

    Journal ref: 2024 Challenge. Proc. 4th Symposium on Security and Privacy in Speech Communication, 72-79

  36. arXiv:2410.01350  [pdf, other

    cs.SD cs.AI eess.AS

    Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling

    Authors: Yuguang Yang, Yu Pan, Jixun Yao, Xiang Zhang, Jianhao Ye, Hongbin Zhou, Lei Xie, Lei Ma, Jianjun Zhao

    Abstract: Expressive zero-shot voice conversion (VC) is a critical and challenging task that aims to transform the source timbre into an arbitrary unseen speaker while preserving the original content and expressive qualities. Despite recent progress in zero-shot VC, there remains considerable potential for improvements in speaker similarity and speech naturalness. Moreover, existing zero-shot VC systems str… ▽ More

    Submitted 10 January, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Work in Progress; Under Review

  37. arXiv:2409.12139  [pdf, other

    cs.SD cs.AI eess.AS

    Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

    Authors: Sijing Chen, Yuan Feng, Laipeng He, Tianwei He, Wendi He, Yanni Hu, Bin Lin, Yiting Lin, Yu Pan, Pengfei Tan, Chengwei Tian, Chen Wang, Zhicheng Wang, Ruoye Xie, Jixun Yao, Quanlei Yan, Yuguang Yang, Jianhao Ye, Jingjing Yin, Yanzhen Yu, Huimin Zhang, Xiang Zhang, Guangcheng Zhao, Hongbin Zhou, Pengpeng Zou

    Abstract: With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-… ▽ More

    Submitted 23 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: Technical Report; 18 pages; typos corrected, references added, demo url modified, author name modified;

  38. arXiv:2409.04173  [pdf, other

    eess.AS

    NPU-NTU System for Voice Privacy 2024 Challenge

    Authors: Jixun Yao, Nikita Kuzmin, Qing Wang, Pengcheng Guo, Ziqian Ning, Dake Guo, Kong Aik Lee, Eng-Siong Chng, Lei Xie

    Abstract: Speaker anonymization is an effective privacy protection solution that conceals the speaker's identity while preserving the linguistic content and paralinguistic information of the original speech. To establish a fair benchmark and facilitate comparison of speaker anonymization systems, the VoicePrivacy Challenge (VPC) was held in 2020 and 2022, with a new edition planned for 2024. In this paper,… ▽ More

    Submitted 4 February, 2025; v1 submitted 6 September, 2024; originally announced September 2024.

    Comments: System description for VPC 2024

  39. arXiv:2408.15474  [pdf, other

    eess.AS cs.SD

    Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation

    Authors: Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, Jie Ding, Lei Xie

    Abstract: Rap, a prominent genre of vocal performance, remains underexplored in vocal generation. General vocal synthesis depends on precise note and duration inputs, requiring users to have related musical knowledge, which limits flexibility. In contrast, rap typically features simpler melodies, with a core focus on a strong rhythmic sense that harmonizes with accompanying beats. In this paper, we propose… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  40. arXiv:2408.13447  [pdf, ps, other

    eess.SP

    FAS-RIS Communication: Model, Analysis, and Optimization

    Authors: Junteng Yao, Jianchao Zheng, Tuo Wu, Ming Jin, Chau Yuen, Kai-Kit Wong, Fumiyuki Adachi

    Abstract: This correspondence investigates the novel fluid antenna system (FAS) technology, combining with reconfigurable intelligent surface (RIS) for wireless communications, where a base station (BS) communicates with a FAS-enabled user with the assistance of a RIS. To analyze this technology, we derive the outage probability based on the block-diagonal matrix approximation (BDMA) model. With this, we ob… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  41. arXiv:2408.13444  [pdf, ps, other

    eess.SP

    FAS-RIS: A Block-Correlation Model Analysis

    Authors: Xiazhi Lai, Junteng Yao, Kangda Zhi, Tuo Wu, David Morales-Jimenez, Kai-Kit Wong

    Abstract: In this correspondence, we analyze the performance of a reconfigurable intelligent surface (RIS)-aided communication system that involves a fluid antenna system (FAS)-enabled receiver. By applying the central limit theorem (CLT), we derive approximate expressions for the system outage probability when the RIS has a large number of elements. Also, we adopt the block-correlation channel model to sim… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  42. Empowering Over-the-Air Personalized Federated Learning via RIS

    Authors: Wei Shi, Jiacheng Yao, Jindan Xu, Wei Xu, Lexi Xu, Chunming Zhao

    Abstract: Over-the-air computation (AirComp) integrates analog communication with task-oriented computation, serving as a key enabling technique for communication-efficient federated learning (FL) over wireless networks. However, AirComp-enabled FL (AirFL) with a single global consensus model fails to address the data heterogeneity in real-life FL scenarios with non-independent and identically distributed l… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by SCIENCE CHINA Information Sciences

  43. arXiv:2408.09067  [pdf, ps, other

    eess.SP

    FAS vs. ARIS: Which Is More Important for FAS-ARIS Communication Systems?

    Authors: Junteng Yao, Liaoshi Zhou, Tuo Wu, Ming Jin, Chongwen Huang, Chau Yuen

    Abstract: In this paper, we investigate the question of which technology, fluid antenna systems (FAS) or active reconfigurable intelligent surfaces (ARIS), plays a more crucial role in FAS-ARIS wireless communication systems. To address this, we develop a comprehensive system model and explore the problem from an optimization perspective. We introduce an alternating optimization (AO) algorithm incorporating… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  44. arXiv:2407.18054  [pdf, other

    eess.IV cs.CV

    LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels

    Authors: Ziwei Cui, Jingfeng Yao, Lunbin Zeng, Juan Yang, Wenyu Liu, Xinggang Wang

    Abstract: The segmentation of cell nuclei in tissue images stained with the blood dye hematoxylin and eosin (H$\&$E) is essential for various clinical applications and analyses. Due to the complex characteristics of cellular morphology, a large receptive field is considered crucial for generating high-quality segmentation. However, previous methods face challenges in achieving a balance between the receptiv… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  45. arXiv:2407.17460  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

    Authors: Jianpeng Yao, Xiaopan Zhang, Yu Xia, Zejin Wang, Amit K. Roy-Chowdhury, Jiachen Li

    Abstract: Reinforcement learning (RL) enables social robots to generate trajectories without relying on human-designed rules or interventions, making it generally more effective than rule-based systems in adapting to complex, dynamic real-world scenarios. However, social navigation is a safety-critical task that requires robots to avoid collisions with pedestrians, whereas existing RL-based solutions often… ▽ More

    Submitted 6 February, 2025; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Project website: https://sonic-social-nav.github.io/; 16 pages

  46. arXiv:2407.12648  [pdf, ps, other

    cs.IT eess.SP

    Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

    Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

    Abstract: Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namel… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 17 pages

  47. arXiv:2407.11629  [pdf, other

    eess.AS

    MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement

    Authors: Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, Lei Xie

    Abstract: Speaker anonymization is an effective privacy protection solution designed to conceal the speaker's identity while preserving the linguistic content and para-linguistic information of the original speech. While most prior studies focus solely on a single language, an ideal speaker anonymization system should be capable of handling multiple languages. This paper proposes MUSA, a Multi-lingual Speak… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Submitted to TASLP

  48. arXiv:2407.11307  [pdf, ps, other

    eess.SP

    Fluid Antenna-Assisted Simultaneous Wireless Information and Power Transfer Systems

    Authors: Liaoshi Zhou, Junteng Yao, Tuo Wu, Ming Jin, Chau Yuen, Fumiyuki Adachi

    Abstract: This paper examines a fluid antenna (FA)-assisted simultaneous wireless information and power transfer (SWIPT) system. Unlike traditional SWIPT systems with fixed-position antennas (FPAs), our FA-assisted system enables dynamic reconfiguration of the radio propagation environment by adjusting the positions of FAs. This capability enhances both energy harvesting and communication performance. The s… ▽ More

    Submitted 23 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  49. arXiv:2407.08141  [pdf, ps, other

    eess.SP

    A Framework of FAS-RIS Systems: Performance Analysis and Throughput Optimization

    Authors: Junteng Yao, Xiazhi Lai, Kangda Zhi, Tuo Wu, Ming Jin, Cunhua Pan, Maged Elkashlan, Chau Yuen, Kai-Kit Wong

    Abstract: In this paper, we investigate reconfigurable intelligent surface (RIS)-assisted communication systems which involve a fixed-antenna base station (BS) and a mobile user (MU) that is equipped with fluid antenna system (FAS). Specifically, the RIS is utilized to enable communication for the user whose direct link from the base station is blocked by obstacles. We propose a comprehensive framework that… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal for possible publication

  50. arXiv:2407.00718  [pdf, other

    eess.IV cs.CV

    ASPS: Augmented Segment Anything Model for Polyp Segmentation

    Authors: Huiqian Li, Dingwen Zhang, Jieru Yao, Longfei Han, Zhongyu Li, Junwei Han

    Abstract: Polyp segmentation plays a pivotal role in colorectal cancer diagnosis. Recently, the emergence of the Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation, leveraging its powerful pre-training capability on large-scale datasets. However, due to the domain gap between natural and endoscopy images, SAM encounters two limitations in achieving effective performan… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI2024