Skip to main content

Showing 1–1 of 1 results for author: Farag, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.16423  [pdf, other

    eess.AS cs.SD

    WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding

    Authors: Mohan Li, Cong-Thanh Do, Simon Keizer, Youmna Farag, Svetlana Stoyanchev, Rama Doddipatla

    Abstract: Speech large language models (speech-LLMs) integrate speech and text-based foundation models to provide a unified framework for handling a wide range of downstream tasks. In this paper, we introduce WHISMA, a speech-LLM tailored for spoken language understanding (SLU) that demonstrates robust performance in various zero-shot settings. WHISMA combines the speech encoder from Whisper with the Llama-… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: accepted to SLT 2024