Skip to main content

Showing 1–3 of 3 results for author: Artzi, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2205.01086  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages

    Authors: Felix Wu, Kwangyoun Kim, Shinji Watanabe, Kyu Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi

    Abstract: We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data. We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task -- transcribing audio inputs into pseudo subword sequences. This process stands on its own, or can be applied as low-cost second-stage pre-training… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: Code available at https://github.com/asappresearch/wav2seq

  2. arXiv:2111.10367  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

    Authors: Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han

    Abstract: Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks. Interest has been growing in higher-level spoken language understanding tasks, including using end-to-end models, but there are fewer annotated datasets for such tasks. At the same time, rece… ▽ More

    Submitted 29 July, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

    Comments: Updated preprint for SLUE Benchmark v0.2; Toolkit link https://github.com/asappresearch/slue-toolkit

  3. arXiv:2109.06870  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

    Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi

    Abstract: This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improveme… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Code available at https://github.com/asappresearch/sew