Skip to main content

Showing 1–21 of 21 results for author: Pham, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.16580  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement

    Authors: Tuan-Nam Nguyen, Ngoc-Quan Pham, Seymanur Akti, Alexander Waibel

    Abstract: We propose a first streaming accent conversion (AC) model that transforms non-native speech into a native-like accent while preserving speaker identity, prosody and improving pronunciation. Our approach enables stream processing by modifying a previous AC architecture with an Emformer encoder and an optimized inference mechanism. Additionally, we integrate a native text-to-speech (TTS) model to ge… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  2. arXiv:2506.16574  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Weight Factorization and Centralization for Continual Learning in Speech Recognition

    Authors: Enes Yavuz Ugan, Ngoc-Quan Pham, Alexander Waibel

    Abstract: Modern neural network based speech recognition models are required to continually absorb new data without re-training the whole system, especially in downstream applications using foundation models, having no access to the original training data. Continually training the models in a rehearsal-free, multilingual, and language agnostic condition, likely leads to catastrophic forgetting, when a seemi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  3. arXiv:2506.00368  [pdf, ps, other

    eess.SP cs.AI

    Neural Network-based Information-Theoretic Transceivers for High-Order Modulation Schemes

    Authors: Ngoc Long Pham, Tri Nhu Do

    Abstract: Neural network (NN)-based end-to-end (E2E) communication systems, in which each system component may consist of a portion of a neural network, have been investigated as potential tools for developing artificial intelligence (Al)-native E2E systems. In this paper, we propose an NN-based bitwise receiver that improves computational efficiency while maintaining performance comparable to baseline dema… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  4. arXiv:2410.14997  [pdf, other

    cs.SD cs.AI eess.AS

    Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS

    Authors: Tuan Nam Nguyen, Seymanur Akti, Ngoc Quan Pham, Alexander Waibel

    Abstract: Previous approaches on accent conversion (AC) mainly aimed at making non-native speech sound more native while maintaining the original content and speaker identity. However, non-native speakers sometimes have pronunciation issues, which can make it difficult for listeners to understand them. Hence, we developed a new AC approach that not only focuses on accent conversion but also improves pronunc… ▽ More

    Submitted 4 March, 2025; v1 submitted 19 October, 2024; originally announced October 2024.

    Comments: accepted at ICASSP 2025

  5. arXiv:2410.08229  [pdf, ps, other

    cs.CV cs.NE eess.IV

    Improvement of Spiking Neural Network with Bit Planes and Color Models

    Authors: Nhan T. Luu, Duong T. Luu, Nam N. Pham, Thang C. Truong

    Abstract: Spiking neural network (SNN) has emerged as a promising paradigm in computational neuroscience and artificial intelligence, offering advantages such as low energy consumption and small memory footprint. However, their practical adoption is constrained by several challenges, prominently among them being performance optimization. In this study, we present a novel approach to enhance the performance… ▽ More

    Submitted 11 July, 2025; v1 submitted 28 September, 2024; originally announced October 2024.

    Comments: 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN)

  6. arXiv:2410.03734  [pdf, other

    cs.SD cs.CL eess.AS

    Accent conversion using discrete units with parallel data synthesized from controllable accented TTS

    Authors: Tuan Nam Nguyen, Ngoc Quan Pham, Alexander Waibel

    Abstract: The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity. Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent. This paper presents a promising AC model that can convert many accents into native to overcome… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Accepted at Syndata4genAI

  7. arXiv:2406.19077  [pdf, other

    eess.SY

    Parameter Dependent Chen--Fliess Series and Their Nonrecursive Interconnections

    Authors: W. Steven Gray, Natalie Pham

    Abstract: A class of parameter dependent Chen--Fliess series is introduced where the series coefficients are taken from a noncommutative ring of multivariable differential operators. Such series are shown in the linear case to represent formal solutions to Cauchy initial value problems for nonhomogeneous PDEs and thus are useful for characterizing the input-output maps of distributed control systems. It is… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    MSC Class: 41A58; 93C10; 35C10

  8. arXiv:2401.05425  [pdf

    eess.SP cs.LG

    An Unobtrusive and Lightweight Ear-worn System for Continuous Epileptic Seizure Detection

    Authors: Abdul Aziz, Nhat Pham, Neel Vora, Cody Reynolds, Jaime Lehnen, Pooja Venkatesh, Zhuoran Yao, Jay Harvey, Tam Vu, Kan Ding, Phuc Nguyen

    Abstract: Epilepsy is one of the most common neurological diseases globally (around 50 million people worldwide). Fortunately, up to 70% of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scalp-based EEG test… ▽ More

    Submitted 24 October, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  9. arXiv:2311.11096  [pdf, other

    eess.IV cs.CV

    On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

    Authors: Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert

    Abstract: Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: Advances in Neural Information Processing Systems (NeurIPS) 2023, Workshop on robustness of zero/few-shot learning in foundation models

  10. arXiv:2211.11703  [pdf, other

    cs.CL cs.SD eess.AS

    Towards continually learning new languages

    Authors: Ngoc-Quan Pham, Jan Niehues, Alexander Waibel

    Abstract: Multilingual speech recognition with neural networks is often implemented with batch-learning, when all of the languages are available before training. An ability to add new languages after the prior training sessions can be economically beneficial, but the main challenge is catastrophic forgetting. In this work, we combine the qualities of weight factorization and elastic weight consolidation in… ▽ More

    Submitted 17 July, 2024; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Work in progress

  11. arXiv:2211.02592  [pdf

    eess.SY

    A Large-Scale Study of a Sleep Tracking and Improving Device with Closed-loop and Personalized Real-time Acoustic Stimulation

    Authors: Anh Nguyen, Galen Pogoncheff, Ban Xuan Dong, Nam Bui, Hoang Truong, Nhat Pham, Linh Nguyen, Hoang Huu Nguyen, Sy Duong-Quy, Sangtae Ha, Tam Vu

    Abstract: Various intervention therapies ranging from pharmaceutical to hi-tech tailored solutions have been available to treat difficulty in falling asleep commonly caused by insomnia in modern life. However, current techniques largely remain ill-suited, ineffective, and unreliable due to their lack of precise real-time sleep tracking, in-time feedback on the therapies, an ability to keep people asleep dur… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 33 pages, 8 figures

  12. arXiv:2205.12304  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Adaptive multilingual speech recognition with pretrained models

    Authors: Ngoc-Quan Pham, Alex Waibel, Jan Niehues

    Abstract: Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research. With the development of pretraining methods on audio and text data, it is imperative to transfer the knowledge from unsupervised multilingual models to facilitate recognition, especially in many languages with limited data. Our work investigated the effectiveness of using two pretra… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  13. arXiv:2109.09026  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-Recurrent Neural Networks for Speech Emotion Recognition

    Authors: Nhat Truong Pham, Duc Ngoc Minh Dang, Sy Dzung Nguyen

    Abstract: Speech emotion recognition (SER) has been one of the significant tasks in Human-Computer Interaction (HCI) applications. However, it is hard to choose the optimal features and deal with imbalance labeled data. In this article, we investigate hybrid data augmentation (HDA) methods to generate and balance data based on traditional and generative adversarial networks (GAN) methods. To evaluate the ef… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: 12 pages, 16 figures, 6 tables

  14. arXiv:2109.03219  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    Fruit-CoV: An Efficient Vision-based Framework for Speedy Detection and Diagnosis of SARS-CoV-2 Infections Through Recorded Cough Sounds

    Authors: Long H. Nguyen, Nhat Truong Pham, Van Huong Do, Liu Tai Nguyen, Thanh Tin Nguyen, Van Dung Do, Hai Nguyen, Ngoc Duy Nguyen

    Abstract: SARS-CoV-2 is colloquially known as COVID-19 that had an initial outbreak in December 2019. The deadly virus has spread across the world, taking part in the global pandemic disease since March 2020. In addition, a recent variant of SARS-CoV-2 named Delta is intractably contagious and responsible for more than four million deaths over the world. Therefore, it is vital to possess a self-testing serv… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 4 pages

  15. arXiv:2108.11089  [pdf, other

    cs.SD eess.AS

    Detecting Drill Failure in the Small Short-sound Drill Dataset

    Authors: Thanh Tran, Nhat Truong Pham, Jan Lundgren

    Abstract: Monitoring the conditions of machines is vital in the manufacturing industry. Early detection of faulty components in machines for stopping and repairing the failed components can minimize the downtime of the machine. This article presents an approach to detect the failure occurring in drill machines based on drill sounds from Valmet AB. The drill dataset includes three classes: anomalous sounds,… ▽ More

    Submitted 9 November, 2021; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: 8 pages, 10 figures, journal

  16. arXiv:2105.03010  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Efficient Weight factorization for Multilingual Speech Recognition

    Authors: Ngoc-Quan Pham, Tuan-Nam Nguyen, Sebastian Stueker, Alexander Waibel

    Abstract: End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages, resulting in a single neural network to handle transcribing different languages. Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously. In th… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: Submitted to Interspeech 2021

  17. arXiv:2005.09940  [pdf, other

    eess.AS cs.CL cs.SD

    Relative Positional Encoding for Speech Recognition and Direct Translation

    Authors: Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stueker, Jan Niehues, Alexander Waibel

    Abstract: Transformer models are powerful sequence-to-sequence architectures that are capable of directly mapping speech inputs to transcriptions or translations. However, the mechanism for modeling positions in this model was tailored for text modeling, and thus is less ideal for acoustic inputs. In this work, we adapt the relative position encoding scheme to the Speech Transformer, where the key addition… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  18. arXiv:2003.10022  [pdf, other

    eess.AS cs.CL cs.SD

    High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

    Authors: Thai-Son Nguyen, Ngoc-Quan Pham, Sebastian Stueker, Alex Waibel

    Abstract: Recently sequence-to-sequence models have started to achieve state-of-the-art performance on standard speech recognition tasks when processing audio data in batch mode, i.e., the complete audio data is available when starting processing. However, when it comes to performing run-on recognition on an input stream of audio data while producing recognition results in real-time and with low word-based… ▽ More

    Submitted 26 July, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

    Comments: To appear in Interspeech 2020

  19. arXiv:1910.05603  [pdf, other

    cs.CL cs.SD eess.AS

    VAIS ASR: Building a conversational speech recognition system using language model combination

    Authors: Quang Minh Nguyen, Thai Binh Nguyen, Ngoc Phuong Pham, The Loc Nguyen

    Abstract: Automatic Speech Recognition (ASR) systems have been evolving quickly and reaching human parity in certain cases. The systems usually perform pretty well on reading style and clean speech, however, most of the available systems suffer from situation where the speaking style is conversation and in noisy environments. It is not straight-forward to tackle such problems due to difficulties in data col… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

    Comments: 3 pages, 1 figures, Vietnamese Language and Speech Processing conference)

  20. arXiv:1908.09766  [pdf

    cs.NI eess.SY

    A Hybrid of Adaptation and Dynamic Routing based on SDN for Improving QoE in HTTP Adaptive VBR Video Streaming

    Authors: Hong Thinh Pham, Ngoc Nam Pham, Huu Thanh Nguyen, Alan Marshall, Thu Huong Truong

    Abstract: Recently, HTTP Adaptive Streaming HAS has received significant attention from both industry and academia based on its ability to enhancing media streaming services over the Internet. Recent research solutions that have tried to improve HAS by adaptation at the client side only may not be completely effective without interacting with routing decisions in the upper layers. In this paper, we address… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

    Comments: 14 pages, 17 figures, IJCSNS International Journal of Computer Science and Network Security, http://paper.ijcsns.org/07_book/201907/20190708.pdf

    Journal ref: VOL.19 No.7, July 2019

  21. arXiv:1904.13377  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Very Deep Self-Attention Networks for End-to-End Speech Recognition

    Authors: Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller, Sebastian Stüker, Alexander Waibel

    Abstract: Recently, end-to-end sequence-to-sequence models for speech recognition have gained significant interest in the research community. While previous architecture choices revolve around time-delay neural networks (TDNN) and long short-term memory (LSTM) recurrent neural networks, we propose to use self-attention via the Transformer architecture as an alternative. Our analysis shows that deep Transfor… ▽ More

    Submitted 3 May, 2019; v1 submitted 30 April, 2019; originally announced April 2019.

    Comments: Submitted to INTERSPEECH 2019