Skip to main content

Showing 1–8 of 8 results for author: Lei, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.22053  [pdf, other

    cs.SD cs.MA cs.MM eess.AS

    AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation

    Authors: Yan Rong, Jinting Wang, Shan Yang, Guangzhi Lei, Li Liu

    Abstract: Multimodality-to-Multiaudio (MM2MA) generation faces significant challenges in synthesizing diverse and contextually aligned audio types (e.g., sound effects, speech, music, and songs) from multimodal inputs (e.g., video, text, images), owing to the scarcity of high-quality paired datasets and the lack of robust multi-task learning frameworks. Recently, multi-agent system shows great potential in… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  2. arXiv:2505.06248  [pdf, ps, other

    eess.SP cs.IT

    Low-Complexity Channel Estimation in OTFS Systems with Fractional Effects

    Authors: Guangyu Lei, Yanduo Qiao, Tianhao Liang, Weijie Yuan, Tingting Zhang

    Abstract: Orthogonal Time Frequency Space (OTFS) modulation exploits the sparsity of Delay-Doppler domain channels, making it highly effective in high-mobility scenarios. Its accurate channel estimation supports integrated sensing and communication (ISAC) systems. The letter introduces a low-complexity technique for estimating delay and Doppler shifts under fractional effects, while addressing inter-path in… ▽ More

    Submitted 28 April, 2025; originally announced May 2025.

  3. arXiv:2504.11002  [pdf, other

    cs.SD cs.MM eess.AS

    Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation

    Authors: Yan Rong, Shan Yang, Guangzhi Lei, Li Liu

    Abstract: Audiobook generation, which creates vivid and emotion-rich audio works, faces challenges in conveying complex emotions, achieving human-like qualities, and aligning evaluations with human preferences. Existing text-to-speech (TTS) methods are often limited to specific scenarios, struggle with emotional transitions, and lack automatic human-aligned evaluation benchmarks, instead relying on either m… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  4. arXiv:2501.04964  [pdf

    eess.SY

    Promoting Shared Energy Storage Aggregation among High Price-Tolerance Prosumer: An Incentive Deposit and Withdrawal Service

    Authors: Xin Lu, Jing Qiu, Cuo Zhang, Gang Lei, Jianguo Zhu

    Abstract: Many residential prosumers exhibit a high price-tolerance for household electricity bills and a low response to price incentives. This is because the household electricity bills are not inherently high, and the potential for saving on electricity bills through participation in conventional Shared Energy Storage (SES) is limited, which diminishes their motivation to actively engage in SES. Addition… ▽ More

    Submitted 13 January, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

  5. arXiv:2410.13288  [pdf, other

    eess.AS cs.SD

    DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis

    Authors: Yu Gu, Qiushi Zhu, Guangzhi Lei, Chao Weng, Dan Su

    Abstract: This paper proposes an improved version of DurIAN-E (DurIAN-E 2), which is also a duration informed attention neural network for expressive and high-fidelity text-to-speech (TTS) synthesis. Similar with the DurIAN-E model, multiple stacked SwishRNN-based Transformer blocks are utilized as linguistic encoders and Style-Adaptive Instance Normalization (SAIN) layers are also exploited into frame-leve… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Accepted by ICASSP2024

  6. arXiv:2309.12792  [pdf, other

    eess.AS cs.SD

    DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis

    Authors: Yu Gu, Yianrao Bian, Guangzhi Lei, Chao Weng, Dan Su

    Abstract: This paper introduces an improved duration informed attention neural network (DurIAN-E) for expressive and high-fidelity text-to-speech (TTS) synthesis. Inherited from the original DurIAN model, an auto-regressive model structure in which the alignments between the input linguistic information and the output acoustic features are inferred from a duration model is adopted. Meanwhile the proposed Du… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  7. arXiv:2006.11610  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

    Authors: Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

    Abstract: Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input. In this work, we propose a novel approach using phone… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

    Comments: 5 pages, 5 figures

  8. arXiv:1909.01700  [pdf, other

    cs.CL cs.CV cs.SD eess.AS

    DurIAN: Duration Informed Attention Network For Multimodal Synthesis

    Authors: Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu

    Abstract: In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously. The key component of this system is the Duration Informed Attention Network (DurIAN), an autoregressive model in which the alignments between the input text and the output acoustic features are inferred from a duration model. This is different from th… ▽ More

    Submitted 5 September, 2019; v1 submitted 4 September, 2019; originally announced September 2019.