Skip to main content

Showing 1–5 of 5 results for author: Nguyen, L T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.01322  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Zero-Shot Text-to-Speech for Vietnamese

    Authors: Thi Vu, Linh The Nguyen, Dat Quoc Nguyen

    Abstract: This paper introduces PhoAudiobook, a newly curated dataset comprising 941 hours of high-quality audio for Vietnamese text-to-speech. Using PhoAudiobook, we conduct experiments on three leading zero-shot TTS models: VALL-E, VoiceCraft, and XTTS-V2. Our findings demonstrate that PhoAudiobook consistently enhances model performance across various metrics. Moreover, VALL-E and VoiceCraft exhibit supe… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: To appear in Proceedings of ACL 2025 (Main conference paper)

  2. arXiv:2406.02555  [pdf, ps, other

    eess.AS cs.CL

    PhoWhisper: Automatic Speech Recognition for Vietnamese

    Authors: Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

    Abstract: We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the Whisper model on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. We have open-sourced PhoWhisper at: https://github.com… ▽ More

    Submitted 27 March, 2024; originally announced June 2024.

    Comments: Accepted to ICLR 2024 Tiny Papers Track

  3. arXiv:2402.01198  [pdf, other

    cs.IT eess.SP

    Physical Layer Location Privacy in SIMO Communication Using Fake Path Injection

    Authors: Trong Duy Tran, Maxime Ferreira Da Costa, Linh Trung Nguyen

    Abstract: Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitters into a single-input multiple-output (SIMO) communication channel to obscure their physical location from an eavesdropper. The case where the receiver (Bob) and the eavesdropper (Eve) use a linear uniform array to locate the transmitter's (Alice) positi… ▽ More

    Submitted 3 February, 2025; v1 submitted 2 February, 2024; originally announced February 2024.

  4. arXiv:2305.19709  [pdf, other

    cs.CL cs.SD eess.AS

    XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

    Authors: Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

    Abstract: We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encod… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: In Proceedings of INTERSPEECH 2023 (to appear)

  5. arXiv:2109.03219  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    Fruit-CoV: An Efficient Vision-based Framework for Speedy Detection and Diagnosis of SARS-CoV-2 Infections Through Recorded Cough Sounds

    Authors: Long H. Nguyen, Nhat Truong Pham, Van Huong Do, Liu Tai Nguyen, Thanh Tin Nguyen, Van Dung Do, Hai Nguyen, Ngoc Duy Nguyen

    Abstract: SARS-CoV-2 is colloquially known as COVID-19 that had an initial outbreak in December 2019. The deadly virus has spread across the world, taking part in the global pandemic disease since March 2020. In addition, a recent variant of SARS-CoV-2 named Delta is intractably contagious and responsible for more than four million deaths over the world. Therefore, it is vital to possess a self-testing serv… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 4 pages