Skip to main content

Showing 1–11 of 11 results for author: Khoong, W H

Searching in archive cs. Search in all archives.
.
  1. Spectrum-based Modality Representation Fusion Graph Convolutional Network for Multimodal Recommendation

    Authors: Rongqing Kenneth Ong, Andy W. H. Khong

    Abstract: Incorporating multi-modal features as side information has recently become a trend in recommender systems. To elucidate user-item preferences, recent studies focus on fusing modalities via concatenation, element-wise sum, or attention mechanisms. Despite having notable success, existing approaches do not account for the modality-specific noise encapsulated within each modality. As a result, direct… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: Accepted to ACM Web Search and Data Mining (WSDM) 2025

  2. arXiv:2309.16953  [pdf, other

    eess.AS cs.SD

    Enhancing Code-switching Speech Recognition with Interactive Language Biases

    Authors: Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur

    Abstract: Languages usually switch within a multilingual speech signal, especially in a bilingual society. This phenomenon is referred to as code-switching (CS), making automatic speech recognition (ASR) challenging under a multilingual scenario. We propose to improve CS-ASR by biasing the hybrid CTC/attention ASR model with multi-level language information comprising frame- and token-level language posteri… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  3. arXiv:2305.18925  [pdf, other

    eess.AS cs.CL cs.SD

    Investigating model performance in language identification: beyond simple error statistics

    Authors: Suzy J. Styles, Victoria Y. H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels

    Abstract: Language development experts need tools that can automatically identify languages from fluent, conversational speech, and provide reliable estimates of usage rates at the level of an individual recording. However, language identification systems are typically evaluated on metrics such as equal error rate and balanced accuracy, applied at the level of an entire speech corpus. These overview metrics… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to Interspeech 2023, 5 pages, 5 figures

  4. Improving performance of real-time full-band blind packet-loss concealment with predictive network

    Authors: Viet-Anh Nguyen, Anh H. T. Nguyen, Andy W. H. Khong

    Abstract: Packet loss concealment (PLC) is a tool for enhancing speech degradation caused by poor network conditions or underflow/overflow in audio processing pipelines. We propose a real-time recurrent method that leverages previous outputs to mitigate artefact of lost packets without the prior knowledge of loss mask. The proposed full-band recurrent network (FRN) model operates at 48 kHz, which is suitabl… ▽ More

    Submitted 12 May, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: In Proceedings ICASSP 2023, 5 pages, 1 figure, 4 tables

  5. arXiv:2210.14567  [pdf, other

    eess.AS cs.SD

    Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

    Authors: Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur

    Abstract: Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  6. arXiv:2203.03218  [pdf, other

    eess.AS cs.CL cs.SD

    Enhance Language Identification using Dual-mode Model with Knowledge Distillation

    Authors: Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles, Sanjeev Khudanpur

    Abstract: In this paper, we propose to employ a dual-mode framework on the x-vector self-attention (XSA-LID) model with knowledge distillation (KD) to enhance its language identification (LID) performance for both long and short utterances. The dual-mode XSA-LID model is trained by jointly optimizing both the full and short modes with their respective inputs being the full-length speech and its short clip e… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Submitted to Odyssey 2022

  7. TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining

    Authors: Viet-Anh Nguyen, Anh H. T. Nguyen, Andy W. H. Khong

    Abstract: We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the qual… ▽ More

    Submitted 7 June, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICASSP 2022, 5 pages, 4 figures, 3 tables

  8. arXiv:2102.00196  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech Mixtures

    Authors: Karn Watcharasupat, Anh H. T. Nguyen, Ching-Hui Ooi, Andy W. H. Khong

    Abstract: In blind source separation of speech signals, the inherent imbalance in the source spectrum poses a challenge for methods that rely on single-source dominance for the estimation of the mixing matrix. We propose an algorithm based on the directional sparse filtering (DSF) framework that utilizes the Lehmer mean with learnable weights to adaptively account for source imbalance. Performance evaluatio… ▽ More

    Submitted 14 May, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

    Comments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4485-4489

  9. arXiv:2003.01581  [pdf, other

    eess.IV cs.CV

    BUSU-Net: An Ensemble U-Net Framework for Medical Image Segmentation

    Authors: Wei Hao Khoong

    Abstract: In recent years, convolutional neural networks (CNNs) have revolutionized medical image analysis. One of the most well-known CNN architectures in semantic segmentation is the U-net, which has achieved much success in several medical image segmentation applications. Also more recently, with the rise of autoML ad advancements in neural architecture search (NAS), methods like NAS-Unet have been propo… ▽ More

    Submitted 8 March, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: GitHub link to the model scripts and trained model weights can be found in the manuscript. Version 2: Added S-UNet's Mi-UNet results for comparison and reference

  10. arXiv:2002.00167  [pdf, other

    cs.DS math.CO math.OC

    Solving the Joint Order Batching and Picker Routing Problem for Large Instances

    Authors: Wei Hao Khoong

    Abstract: In this work, we investigate the problem of order batching and picker routing in warehouse storage areas. These problems are known to be capital and labour intensive, and often contribute to a sizable fraction of warehouse operating costs. Here, we consider the case of online grocery shopping where orders may consist of dozens of items. We present the problem introduced and tackle the issue of s… ▽ More

    Submitted 1 February, 2020; originally announced February 2020.

    Comments: Thesis submitted to the department end April 2019. Please cite the year as 2019 instead. Data and results can be found at https://github.com/weihao94/Solving-the-Joint-Order-Batching-and-Picker-Routing-Problem-for-Large-Instances

  11. arXiv:1909.12226  [pdf, other

    cs.LG

    A Heuristic for Efficient Reduction in Hidden Layer Combinations For Feedforward Neural Networks

    Authors: Wei Hao Khoong

    Abstract: In this paper, we describe the hyper-parameter search problem in the field of machine learning and present a heuristic approach in an attempt to tackle it. In most learning algorithms, a set of hyper-parameters must be determined before training commences. The choice of hyper-parameters can affect the final model's performance significantly, but yet determining a good choice of hyper-parameters is… ▽ More

    Submitted 11 January, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

    Comments: To appear in the proceedings of the 2020 Computing Conference