Skip to main content

Showing 1–4 of 4 results for author: Hsu, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2409.10429  [pdf, ps, other

    eess.AS cs.CL cs.SD

    SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition

    Authors: Ming-Hao Hsu, Hung-yi Lee

    Abstract: Automatic Speech Recognition (ASR) models demonstrate outstanding performance on high-resource languages but face significant challenges when applied to low-resource languages due to limited training data and insufficient cross-lingual generalization. Existing adaptation strategies, such as shallow fusion, data augmentation, and direct fine-tuning, either rely on external resources, suffer computa… ▽ More

    Submitted 14 June, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

  2. arXiv:2408.03485  [pdf, ps, other

    eess.SP

    Sub-Resolution mmWave FMCW Radar-based Touch Localization using Deep Learning

    Authors: Raghunandan M. Rao, Amit Kachroo, Koushik A. Manjunatha, Morris Hsu, Rohit Kumar

    Abstract: Touchscreen-based interaction on display devices are ubiquitous nowadays. However, capacitive touch screens, the core technology that enables its widespread use, are prohibitively expensive to be used in large displays because the cost increases proportionally with the screen area. In this paper, we propose a millimeter wave (mmWave) radar-based solution to achieve subresolution error performance… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 7 pages, 9 figures and 2 tables. To appear in the 100th Vehicular Technology Conference (VTC-Fall 2024)

  3. arXiv:2310.12477  [pdf, other

    eess.AS cs.AI cs.CL

    Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks

    Authors: Ming-Hao Hsu, Kai-Wei Chang, Shang-Wen Li, Hung-yi Lee

    Abstract: Ever since the development of GPT-3 in the natural language processing (NLP) field, in-context learning (ICL) has played an essential role in utilizing large language models (LLMs). By presenting the LM utterance-label demonstrations at the input, the LM can accomplish few-shot learning without relying on gradient descent or requiring explicit modification of its parameters. This enables the LM to… ▽ More

    Submitted 15 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted to Interspeech 2024. The first two authors contributed equally, and their order is random

  4. arXiv:1808.09351  [pdf, other

    cs.CV cs.GR eess.IV

    3D-Aware Scene Manipulation via Inverse Graphics

    Authors: Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum

    Abstract: We aim to obtain an interpretable, expressive, and disentangled scene representation that contains comprehensive structural and textural information for each object. Previous scene representations learned by neural networks are often uninterpretable, limited to a single object, or lacking 3D knowledge. In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by… ▽ More

    Submitted 18 December, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

    Comments: NeurIPS 2018. Code: https://github.com/ysymyth/3D-SDN Website: http://3dsdn.csail.mit.edu/