Skip to main content

Showing 1–2 of 2 results for author: Ha, H H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.17417  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Speechless: Speech Instruction Training Without Speech for Low Resource Languages

    Authors: Alan Dao, Dinh Bach Vu, Huy Hoang Ha, Tuan Le Duc Anh, Shreyas Gopal, Yue Heng Yeo, Warren Keng Hoong Low, Eng Siong Chng, Jia Qi Yip

    Abstract: The rapid growth of voice assistants powered by large language models (LLM) has highlighted a need for speech instruction data to train these systems. Despite the abundance of speech recognition data, there is a notable scarcity of speech instruction data, which is essential for fine-tuning models to understand and execute spoken commands. Generating high-quality synthetic speech requires a good t… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: This paper was accepted by INTERSPEECH 2025

  2. arXiv:2410.15316  [pdf, other

    cs.CL cs.SD eess.AS

    Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

    Authors: Alan Dao, Dinh Bach Vu, Huy Hoang Ha

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing, but their application to speech-based tasks remains challenging due to the complexities of integrating audio and text modalities. This paper introduces Ichigo, a mixed-modal model that seamlessly processes interleaved sequences of speech and text. Utilizing a tokenized early-fusion approach, Ichigo quantizes speech into… ▽ More

    Submitted 4 April, 2025; v1 submitted 20 October, 2024; originally announced October 2024.