Skip to main content

Showing 1–4 of 4 results for author: Ratajczak, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.19761  [pdf, ps, other

    cs.CL

    Accurate, fast, cheap: Choose three. Replacing Multi-Head-Attention with Bidirectional Recurrent Attention for Long-Form ASR

    Authors: Martin Ratajczak, Jean-Philippe Robichaud, Jennifer Drexler Fox

    Abstract: Long-form speech recognition is an application area of increasing research focus. ASR models based on multi-head attention (MHA) are ill-suited to long-form ASR because of their quadratic complexity in sequence length. We build on recent work that has investigated linear complexity recurrent attention (RA) layers for ASR. We find that bidirectional RA layers can match the accuracy of MHA for both… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  2. arXiv:2412.07937  [pdf, other

    cs.CL

    Style-agnostic evaluation of ASR using multiple reference transcripts

    Authors: Quinten McNamara, Miguel Ángel del Río Fernández, Nishchal Bhandari, Martin Ratajczak, Danny Chen, Corey Miller, Migüel Jetté

    Abstract: Word error rate (WER) as a metric has a variety of limitations that have plagued the field of speech recognition. Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing style-agnostic evaluation of ASR systems using multiple references transcribed under opposing style p… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  3. arXiv:2410.03930  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Reverb: Open-Source ASR and Diarization from Rev

    Authors: Nishchal Bhandari, Danny Chen, Miguel Ángel del Río Fernández, Natalie Delworth, Jennifer Drexler Fox, Migüel Jetté, Quinten McNamara, Corey Miller, Ondřej Novotný, Ján Profant, Nan Qin, Martin Ratajczak, Jean-Philippe Robichaud

    Abstract: Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all exi… ▽ More

    Submitted 24 February, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  4. arXiv:1807.02324  [pdf, other

    cs.LG stat.ML

    Sum-Product Networks for Sequence Labeling

    Authors: Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf

    Abstract: We consider higher-order linear-chain conditional random fields (HO-LC-CRFs) for sequence modelling, and use sum-product networks (SPNs) for representing higher-order input- and output-dependent factors. SPNs are a recently introduced class of deep models for which exact and efficient inference can be performed. By combining HO-LC-CRFs with SPNs, expressive models over both the output labels and t… ▽ More

    Submitted 6 July, 2018; originally announced July 2018.