Skip to main content

Showing 1–11 of 11 results for author: Chi, T

Searching in archive eess. Search in all archives.
.
  1. Personalized Head-Related Transfer Function Prediction Based on Spatial Grouping

    Authors: Keng-Wei Chang, Yih-Liang Shen, Tai-Shi Chi

    Abstract: The head-related transfer function (HRTF) characterizes the frequency response of the sound traveling path between a specific location and the ear. When it comes to estimating HRTFs by neural network models, angle-specific models greatly outperform global models but demand high computational resources. To balance the computational resource and performance, we propose a method by grouping HRTF data… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  2. arXiv:2406.08445  [pdf, other

    eess.AS cs.LG cs.SD

    SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models

    Authors: Chun Yin, Tai-Shih Chi, Yu Tsao, Hsin-Min Wang

    Abstract: Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  3. arXiv:2306.06653  [pdf, other

    cs.SD eess.AS

    Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features

    Authors: Hsin-Hao Chen, Yung-Lun Chien, Ming-Chi Yen, Shu-Wei Tsai, Yu Tsao, Tai-shih Chi, Hsin-Min Wang

    Abstract: Patients who have had their entire larynx removed, including the vocal folds, owing to throat cancer may experience difficulties in speaking. In such cases, electrolarynx devices are often prescribed to produce speech, which is commonly referred to as electrolaryngeal speech (EL speech). However, the quality and intelligibility of EL speech are poor. To address this problem, EL voice conversion (E… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted to INTERSPEECH 2023

  4. arXiv:2306.06652  [pdf, other

    cs.SD eess.AS

    Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion

    Authors: Yung-Lun Chien, Hsin-Hao Chen, Ming-Chi Yen, Shu-Wei Tsai, Hsin-Min Wang, Yu Tsao, Tai-Shih Chi

    Abstract: Electrolarynx is a commonly used assistive device to help patients with removed vocal cords regain their ability to speak. Although the electrolarynx can generate excitation signals like the vocal cords, the naturalness and intelligibility of electrolaryngeal (EL) speech are very different from those of natural (NL) speech. Many deep-learning-based models have been applied to electrolaryngeal spee… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: Accepted to INTERSPEECH 2023

  5. arXiv:2303.03634  [pdf

    eess.SP cs.LG

    PreFallKD: Pre-Impact Fall Detection via CNN-ViT Knowledge Distillation

    Authors: Tin-Han Chi, Kai-Chun Liu, Chia-Yeh Hsieh, Yu Tsao, Chia-Tai Chan

    Abstract: Fall accidents are critical issues in an aging and aged society. Recently, many researchers developed pre-impact fall detection systems using deep learning to support wearable-based fall protection systems for preventing severe injuries. However, most works only employed simple neural network models instead of complex models considering the usability in resource-constrained mobile devices and stri… ▽ More

    Submitted 28 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  6. arXiv:2110.09923  [pdf, ps, other

    eess.AS cs.SD

    Speech Enhancement-assisted Voice Conversion in Noisy Environments

    Authors: Yun-Ju Chan, Chiang-Jen Peng, Syu-Siang Wang, Hsin-Min Wang, Yu Tsao, Tai-Shih Chi

    Abstract: Numerous voice conversion (VC) techniques have been proposed for the conversion of voices among different speakers. Although good quality of the converted speech can be observed when VC is applied in a clean environment, the quality degrades drastically when the system is run in noisy conditions. In order to address this issue, we propose a novel speech enhancement (SE)-assisted VC system that uti… ▽ More

    Submitted 19 January, 2023; v1 submitted 19 October, 2021; originally announced October 2021.

    Journal ref: APSIPA 2022

  7. arXiv:2101.02550  [pdf, ps, other

    eess.AS cs.SD

    Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

    Authors: Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi

    Abstract: Multi-task learning (MTL) and attention mechanism have been proven to effectively extract robust acoustic features for various speech-related tasks in noisy environments. In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speake… ▽ More

    Submitted 21 February, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

    Journal ref: IEEE International Symposium on Circuits and Systems 2021

  8. arXiv:2012.08095  [pdf, other

    cs.LG eess.AS

    Automatic Speech Verification Spoofing Detection

    Authors: Shentong Mo, Haofan Wang, Pinxu Ren, Ta-Chung Chi

    Abstract: Automatic speech verification (ASV) is the technology to determine the identity of a person based on their voice. While being convenient for identity verification, we should aim for the highest system security standard given that it is the safeguard of valuable digital assets. Bearing this in mind, we follow the setup in ASVSpoof 2019 competition to develop potential countermeasures that are robus… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  9. arXiv:1912.11984  [pdf

    cs.SD eess.AS

    MoEVC: A Mixture-of-experts Voice Conversion System with Sparse Gating Mechanism for Accelerating Online Computation

    Authors: Yu-Tao Chang, Yuan-Hong Yang, Yu-Huai Peng, Syu-Siang Wang, Tai-Shih Chi, Yu Tsao, Hsin-Min Wang

    Abstract: With the recent advancements of deep learning technologies, the performance of voice conversion (VC) in terms of quality and similarity has been significantly improved. However, heavy computations are generally required for deep-learning-based VC systems, which can cause notable latency and thus confine their deployments in real-world applications. Therefore, increasing online computation efficien… ▽ More

    Submitted 26 December, 2019; originally announced December 2019.

    Comments: Submitted to ICASSP 2020

  10. arXiv:1811.04224  [pdf, ps, other

    eess.AS cs.SD

    Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition

    Authors: Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, Yu Tsao, Hsin-Min Wang, Tai-Shih Chi

    Abstract: Conventional deep neural network (DNN)-based speech enhancement (SE) approaches aim to minimize the mean square error (MSE) between enhanced speech and clean reference. The MSE-optimized model may not directly improve the performance of an automatic speech recognition (ASR) system. If the target is to minimize the recognition error, the recognition results should be used to design the objective fu… ▽ More

    Submitted 10 November, 2018; originally announced November 2018.

    Comments: Conference paper with 4 pages, reinforcement learning, automatic speech recognition, speech enhancement, deep neural network, character error rate

  11. arXiv:1711.08600  [pdf, other

    eess.AS

    Singing voice correction using canonical time warping

    Authors: Yin-Jyun Luo, Ming-Tso Chen, Tai-Shih Chi, Li Su

    Abstract: Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warping (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-c… ▽ More

    Submitted 23 November, 2017; originally announced November 2017.