Skip to main content

Showing 1–7 of 7 results for author: Wierstorf, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.13920  [pdf, other

    cs.SD eess.AS

    Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition

    Authors: Dionyssos Kounadis-Bastian, Oliver Schrüfer, Anna Derington, Hagen Wierstorf, Florian Eyben, Felix Burkhardt, Björn Schuller

    Abstract: Speech Emotion Recognition (SER) needs high computational resources to overcome the challenge of substantial annotator disagreement. Today SER is shifting towards dimensional annotations of arousal, dominance, and valence (A/D/V). Universal metrics as the L2 distance prove unsuitable for evaluating A/D/V accuracy due to non converging consensus of annotator opinions. However, Concordance Correlati… ▽ More

    Submitted 22 November, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: apply review

  2. arXiv:2312.06270  [pdf, other

    eess.AS cs.SD

    Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models

    Authors: Anna Derington, Hagen Wierstorf, Ali Özkil, Florian Eyben, Felix Burkhardt, Björn W. Schuller

    Abstract: Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated based on a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest themse… ▽ More

    Submitted 12 February, 2025; v1 submitted 11 December, 2023; originally announced December 2023.

  3. arXiv:2306.16962  [pdf, other

    cs.SD eess.AS

    Speech-based Age and Gender Prediction with Transformers

    Authors: Felix Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben, Björn Schuller

    Abstract: We report on the curation of several publicly available datasets for age and gender prediction. Furthermore, we present experiments to predict age and gender with models based on a pre-trained wav2vec 2.0. Depending on the dataset, we achieve an MAE between 7.1 years and 10.8 years for age, and at least 91.1% ACC for gender (female, male, child). Compared to a modelling approach built on handcraft… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 5 pages, submitted to 15th ITG Conference on Speech Communication

  4. arXiv:2303.00645  [pdf, other

    eess.AS cs.SD

    audb -- Sharing and Versioning of Audio and Annotation Data in Python

    Authors: Hagen Wierstorf, Johannes Wagner, Florian Eyben, Felix Burkhardt, Björn W. Schuller

    Abstract: Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing. audb is an open-source Python library that supports versioning and documentation of audio datasets. It aims to provide a standardized and simple user-interface to publish, maintain, and access the annotations and audio files of… ▽ More

    Submitted 10 May, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  5. arXiv:2203.07378  [pdf, other

    eess.AS cs.LG cs.SD

    Dawn of the transformer era in speech emotion recognition: closing the valence gap

    Authors: Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Felix Burkhardt, Florian Eyben, Björn W. Schuller

    Abstract: Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks. In the audio domain, such architectures have also been successfully utilised in the field of speech emotion recognition (SER). However, existing works have not evaluated the influence of model size and pre-training data on downstream perform… ▽ More

    Submitted 7 September, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

    Journal ref: in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10745-10759, 1 Sept. 2023

  6. arXiv:1811.00454  [pdf, ps, other

    cs.SD cs.LG cs.MM eess.AS

    Referenceless Performance Evaluation of Audio Source Separation using Deep Neural Networks

    Authors: Emad M. Grais, Hagen Wierstorf, Dominic Ward, Russell Mason, Mark D. Plumbley

    Abstract: Current performance evaluation for audio source separation depends on comparing the processed or separated signals with reference signals. Therefore, common performance evaluation toolkits are not applicable to real-world situations where the ground truth audio is unavailable. In this paper, we propose a performance evaluation technique that does not require reference signals in order to assess se… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    MSC Class: 68T01; 68T10; 68T45; 62H25 ACM Class: H.5.5; I.5; I.2.6; I.4.3; I.4; I.2

    Journal ref: This paper will be presented at EUSIPCO 2019

  7. arXiv:1710.11473  [pdf, ps, other

    cs.SD cs.CV cs.LG eess.AS

    Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation

    Authors: Emad M. Grais, Hagen Wierstorf, Dominic Ward, Mark D. Plumbley

    Abstract: In deep neural networks with convolutional layers, each layer typically has fixed-size/single-resolution receptive field (RF). Convolutional layers with a large RF capture global information from the input features, while layers with small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural ne… ▽ More

    Submitted 28 October, 2017; originally announced October 2017.

    Comments: arXiv admin note: text overlap with arXiv:1703.08019

    MSC Class: 68T01 ACM Class: H.5.5; I.5; I.2.6; I.4.3; I.4; I.2