Skip to main content

Showing 1–3 of 3 results for author: Macherey, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2212.09553  [pdf, other

    cs.CL cs.SD eess.AS

    Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

    Authors: Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna

    Abstract: We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^{2}$SLAM trains the speech-text models with a sequence-t… ▽ More

    Submitted 26 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: ICML 2023

  2. arXiv:1904.06037  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Direct speech-to-speech translation with a sequence-to-sequence model

    Authors: Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu

    Abstract: We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice).… ▽ More

    Submitted 25 June, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

    Comments: Accepted to Interspeech 2019

  3. arXiv:1811.02050  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation

    Authors: Ye Jia, Melvin Johnson, Wolfgang Macherey, Ron J. Weiss, Yuan Cao, Chung-Cheng Chiu, Naveen Ari, Stella Laurenzo, Yonghui Wu

    Abstract: End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding. However, the quality of end-to-end ST is often limited by a paucity of training data, since it is difficult to collect large parallel corpora… ▽ More

    Submitted 10 February, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: ICASSP 2019