Skip to main content

Showing 1–10 of 10 results for author: Kang, W H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2205.01528  [pdf, other

    eess.AS cs.CR cs.SD

    Attentive activation function for improving end-to-end spoofing countermeasure systems

    Authors: Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan

    Abstract: The main objective of the spoofing countermeasure system is to detect the artifacts within the input speech caused by the speech synthesis or voice conversion process. In order to achieve this, we propose to adopt an attentive activation function, more specifically attention rectified linear unit (AReLU) to the end-to-end spoofing countermeasure system. Since the AReLU employs the attention mechan… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

  2. arXiv:2112.03454  [pdf, other

    eess.AS cs.SD

    Robust Speech Representation Learning via Flow-based Embedding Regularization

    Authors: Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan

    Abstract: Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in numerous tasks including speaker verification, language identification and anti-spoofing, their performance is limited when it comes to mismatched conditions due… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  3. arXiv:2102.03542   

    eess.SP cs.LG

    Continuous Monitoring of Blood Pressure with Evidential Regression

    Authors: Hyeongju Kim, Woo Hyun Kang, Hyeonseung Lee, Nam Soo Kim

    Abstract: Photoplethysmogram (PPG) signal-based blood pressure (BP) estimation is a promising candidate for modern BP measurements, as PPG signals can be easily obtained from wearable devices in a non-invasive manner, allowing quick BP measurement. However, the performance of existing machine learning-based BP measuring methods still fall behind some BP measurement guidelines and most of them provide only p… ▽ More

    Submitted 25 February, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

    Comments: We found some errors in the experimental configuration. We plan to revise the paper and republish it later

  4. arXiv:2010.11433  [pdf, other

    eess.AS cs.SD

    Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the pr… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figure, 4 tables

  5. arXiv:2010.11408  [pdf, ps, other

    eess.AS cs.SD

    Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted in INTERSPEECH 2020

  6. Disentangled speaker and nuisance attribute embedding for robust speaker verification

    Authors: Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

    Abstract: Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states)… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Accepted in IEEE Access

  7. Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Hyung Yong Kim, Nam Soo Kim

    Abstract: For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end with… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: 7 pages, 3 figures

    Journal ref: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI} 2020

  8. Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition

    Authors: Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Hyeongju Kim, Nam Soo Kim

    Abstract: Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attent… ▽ More

    Submitted 14 January, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

  9. arXiv:2006.04598  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim

    Abstract: In recent years, various flow-based generative models have been proposed to generate high-fidelity waveforms in real-time. However, these models require either a well-trained teacher network or a number of flow steps making them memory-inefficient. In this paper, we propose a novel generative model called WaveNODE which exploits a continuous normalizing flow for speech synthesis. Unlike the conven… ▽ More

    Submitted 2 July, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 8 pages, 4 figures, Second workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2020)

  10. arXiv:1910.09275  [pdf, other

    cs.CL eess.AS

    Text Matters but Speech Influences: A Computational Analysis of Syntactic Ambiguity Resolution

    Authors: Won Ik Cho, Jeonghwa Cho, Woo Hyun Kang, Nam Soo Kim

    Abstract: Analyzing how human beings resolve syntactic ambiguity has long been an issue of interest in the field of linguistics. It is, at the same time, one of the most challenging issues for spoken language understanding (SLU) systems as well. As syntactic ambiguity is intertwined with issues regarding prosody and semantics, the computational approach toward speech intention identification is expected to… ▽ More

    Submitted 21 May, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: CogSci 2020 Camera-ready