Skip to main content

Showing 1–3 of 3 results for author: Leutnant, V

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.13085  [pdf, other

    eess.AS cs.LG

    Universal Semantic Disentangled Privacy-preserving Speech Representation Learning

    Authors: Biel Tura Vecino, Subhadeep Maji, Aravind Varier, Antonio Bonafonte, Ivan Valles, Michael Owen, Leif Rädel, Grant Strimel, Seyi Feyisetan, Roberto Barra Chicote, Ariya Rastrow, Constantinos Papayiannis, Volker Leutnant, Trevor Wood

    Abstract: The use of audio recordings of human speech to train LLMs poses privacy concerns due to these models' potential to generate outputs that closely resemble artifacts in the training data. In this study, we propose a speaker privacy-preserving representation learning method through the Universal Speech Codec (USC), a computationally efficient encoder-decoder model that disentangles speech into: (i) p… ▽ More

    Submitted 20 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Extended report of the article accepted at Interspeech 2025 (v1)

  2. arXiv:2401.07360  [pdf, other

    cs.CL cs.SD eess.AS

    Promptformer: Prompted Conformer Transducer for ASR

    Authors: Sergio Duarte-Torres, Arunasish Sen, Aman Rana, Lukas Drude, Alejandro Gomez-Alanis, Andreas Schwarz, Leif Rädel, Volker Leutnant

    Abstract: Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rW… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  3. arXiv:2111.10157  [pdf, other

    cs.CL cs.SD eess.AS

    Lattention: Lattice-attention in ASR rescoring

    Authors: Prabhat Pandey, Sergio Duarte Torres, Ali Orkan Bayer, Ankur Gandhe, Volker Leutnant

    Abstract: Lattices form a compact representation of multiple hypotheses generated from an automatic speech recognition system and have been shown to improve performance of downstream tasks like spoken language understanding and speech translation, compared to using one-best hypothesis. In this work, we look into the effectiveness of lattice cues for rescoring n-best lists in second-pass. We encode lattices… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: Submitted to ICASSP 2022

    ACM Class: I.2.7