Skip to main content

Showing 1–11 of 11 results for author: Yen, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.02192  [pdf, ps, other

    eess.AS

    An Investigation on Combining Geometry and Consistency Constraints into Phase Estimation for Speech Enhancement

    Authors: Chun-Wei Ho, Pin-Jui Ku, Hao Yen, Sabato Marco Siniscalchi, Yu Tsao, Chin-Hui Lee

    Abstract: We propose a novel iterative phase estimation framework, termed multi-source Griffin-Lim algorithm (MSGLA), for speech enhancement (SE) under additive noise conditions. The core idea is to leverage the ad-hoc consistency constraint of complex-valued short-time Fourier transform (STFT) spectrograms to address the sign ambiguity challenge commonly encountered in geometry-based phase estimation. Furt… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 5 pages

  2. arXiv:2409.19757  [pdf, other

    eess.AS cs.SD

    Efficient Long-Form Speech Recognition for General Speech In-Context Learning

    Authors: Hao Yen, Shaoshi Ling, Guoli Ye

    Abstract: We propose a novel approach to end-to-end automatic speech recognition (ASR) to achieve efficient speech in-context learning (SICL) for (i) long-form speech decoding, (ii) test-time speaker adaptation, and (iii) test-time contextual biasing. Specifically, we introduce an attention-based encoder-decoder (AED) model with SICL capability (referred to as SICL-AED), where the decoder utilizes an uttera… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: 5 pages, Submitted to ICASSP 2025

  3. arXiv:2409.16282  [pdf, other

    eess.AS cs.SD

    An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement

    Authors: Pin-Jui Ku, Chun-Wei Ho, Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: In this work, we propose a novel consistency-preserving loss function for recovering the phase information in the context of phase reconstruction (PR) and speech enhancement (SE). Different from conventional techniques that directly estimate the phase using a deep model, our idea is to exploit ad-hoc constraints to directly generate a consistent pair of magnitude and phase. Specifically, the propo… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 5 pages, Submitted to ICASSP 2025

  4. arXiv:2406.02488  [pdf, other

    eess.AS cs.CL cs.SD

    Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition

    Authors: Hao Yen, Pin-Jui Ku, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR) leveraging upon (i) a self-supervised pre-trained model, and (ii) a set of universal speech attributes (manner and place of articulation). Specifically, Wav2Vec2.0 is used to generate robust speech representations, followed by a linear output layer to produce attribute sequences. A non-trainable… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2309.08828  [pdf, other

    eess.AS cs.SD

    Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

    Authors: Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee

    Abstract: We propose a first step toward multilingual end-to-end automatic speech recognition (ASR) by integrating knowledge about speech articulators. The key idea is to leverage a rich set of fundamental units that can be defined "universally" across all spoken languages, referred to as speech attributes, namely manner and place of articulation. Specifically, several deterministic attribute-to-phoneme map… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  6. arXiv:2309.03900  [pdf, other

    eess.IV cs.CV

    Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction

    Authors: Su-Kai Chen, Hung-Lin Yen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Wen-Hsiao Peng, Yen-Yu Lin

    Abstract: Deep learning is commonly used to reconstruct HDR images from LDR images. LDR stack-based methods are used for single-image HDR reconstruction, generating an HDR image from a deep learning-generated LDR stack. However, current methods generate the stack with predetermined exposure values (EVs), which may limit the quality of HDR reconstruction. To address this, we propose the continuous exposure v… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Project page: https://skchen1993.github.io/CEVR_web/

  7. arXiv:2211.02527  [pdf, other

    eess.AS cs.SD

    Cold Diffusion for Speech Enhancement

    Authors: Hao Yen, François G. Germain, Gordon Wichern, Jonathan Le Roux

    Abstract: Diffusion models have recently shown promising results for difficult enhancement tasks such as the conditional and unconditional restoration of natural images and audio signals. In this work, we explore the possibility of leveraging a recently proposed advanced iterative diffusion model, namely cold diffusion, to recover clean speech signals from noisy signals. The unique mathematical properties o… ▽ More

    Submitted 23 May, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: 5 pages, 1 figure, 1 table, 3 algorithms. To appear in ICASSP 2023. With corrected references

  8. arXiv:2210.16726  [pdf, ps, other

    eess.AS cs.SD

    Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings

    Authors: Hao Yen, Woojay Jeon

    Abstract: In embedding-matching acoustic-to-word (A2W) ASR, every word in the vocabulary is represented by a fixed-dimension embedding vector that can be added or removed independently of the rest of the system. The approach is potentially an elegant solution for the dynamic out-of-vocabulary (OOV) words problem, where speaker- and context-dependent named entities like contact names must be incorporated int… ▽ More

    Submitted 19 February, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: Accepted to ICASSP 2023

  9. arXiv:2202.13588   

    eess.IV cs.CV

    Using Multi-scale SwinTransformer-HTC with Data augmentation in CoNIC Challenge

    Authors: Chia-Yen Lee, Hsiang-Chin Chien, Ching-Ping Wang, Hong Yen, Kai-Wen Zhen, Hong-Kun Lin

    Abstract: Colorectal cancer is one of the most common cancers worldwide, so early pathological examination is very important. However, it is time-consuming and labor-intensive to identify the number and type of cells on H&E images in clinical. Therefore, automatic segmentation and classification task and counting the cellular composition of H&E images from pathological sections is proposed by CoNIC Challeng… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: Errors have been identified in the analysis

  10. arXiv:2110.03894  [pdf, other

    eess.AS cs.AI cs.LG cs.NE cs.SD

    Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition

    Authors: Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

    Abstract: In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, w… ▽ More

    Submitted 30 October, 2023; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted to Interspeech 2023. Code is available at: https://github.com/dodohow1011/SpeechAdvReprogram. Selected as Best Student Paper Candidate

  11. arXiv:2107.01461  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification

    Authors: Hao Yen, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee

    Abstract: We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC). Specifically, we tackle the ASC task in a low-resource environment leveraging a recently proposed advanced neural network pruning mechanism, namely Lottery Ticket Hypothesis (LTH), to find a sub-network neural model a… ▽ More

    Submitted 1 May, 2022; v1 submitted 3 July, 2021; originally announced July 2021.

    Comments: 5 figures. DCASE 2021. The project started in November 2020. Revised version