Skip to main content

Showing 1–4 of 4 results for author: Rautenberg, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.08791  [pdf, other

    eess.AS cs.SD

    Speech Synthesis along Perceptual Voice Quality Dimensions

    Authors: Frederik Rautenberg, Michael Kuhlmann, Fritz Seebauer, Jana Wiechmann, Petra Wagner, Reinhold Haeb-Umbach

    Abstract: While expressive speech synthesis or voice conversion systems mainly focus on controlling or manipulating abstract prosodic characteristics of speech, such as emotion or accent, we here address the control of perceptual voice qualities (PVQs) recognized by phonetic experts, which are speech properties at a lower level of abstraction. The ability to manipulate PVQs can be a valuable tool for teachi… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP 2025

  2. arXiv:2409.03520  [pdf, other

    eess.AS eess.SP

    Speaker and Style Disentanglement of Speech Based on Contrastive Predictive Coding Supported Factorized Variational Autoencoder

    Authors: Yuying Xie, Michael Kuhlmann, Frederik Rautenberg, Zheng-Hua Tan, Reinhold Haeb-Umbach

    Abstract: Speech signals encompass various information across multiple levels including content, speaker, and style. Disentanglement of these information, although challenging, is important for applications such as voice conversion. The contrastive predictive coding supported factorized variational autoencoder achieves unsupervised disentanglement of a speech signal into speaker and content embeddings by as… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by EUSIPCO 2024

  3. arXiv:2310.12599  [pdf, other

    eess.AS

    On Feature Importance and Interpretability of Speaker Representations

    Authors: Frederik Rautenberg, Michael Kuhlmann, Jana Wiechmann, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach

    Abstract: Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker's voice are captured and investigate to which extent do individual embedding ve… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Presented at the ITG conference on Speech Communication 2023

  4. arXiv:2106.05627  [pdf, other

    cs.SD eess.AS

    A Comparison and Combination of Unsupervised Blind Source Separation Techniques

    Authors: Christoph Boeddeker, Frederik Rautenberg, Reinhold Haeb-Umbach

    Abstract: Unsupervised blind source separation methods do not require a training phase and thus cannot suffer from a train-test mismatch, which is a common concern in neural network based source separation. The unsupervised techniques can be categorized in two classes, those building upon the sparsity of speech in the Short-Time Fourier transform domain and those exploiting non-Gaussianity or non-stationari… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Submitted to ITG 2021