Skip to main content

Showing 1–27 of 27 results for author: Triantafyllopoulos, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.24493  [pdf, ps, other

    cs.AI cs.SD eess.AS

    MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge

    Authors: Xin Jing, Jiadong Wang, Iosif Tsangko, Andreas Triantafyllopoulos, Björn W. Schuller

    Abstract: Although speech emotion recognition (SER) has advanced significantly with deep learning, annotation remains a major hurdle. Human annotation is not only costly but also subject to inconsistencies annotators often have different preferences and may lack the necessary contextual knowledge, which can lead to varied and inaccurate labels. Meanwhile, Large Language Models (LLMs) have emerged as a scala… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  2. arXiv:2504.20776  [pdf

    cs.SD cs.AI eess.AS

    ECOSoundSet: a finely annotated dataset for the automated acoustic identification of Orthoptera and Cicadidae in North, Central and temperate Western Europe

    Authors: David Funosas, Elodie Massol, Yves Bas, Svenja Schmidt, Dominik Arend, Alexander Gebhard, Luc Barbaro, Sebastian König, Rafael Carbonell Font, David Sannier, Fernand Deroussen, Jérôme Sueur, Christian Roesti, Tomi Trilar, Wolfgang Forstmeier, Lucas Roger, Eloïsa Matheu, Piotr Guzik, Julien Barataud, Laurent Pelozuelo, Stéphane Puissant, Sandra Mueller, Björn Schuller, Jose M. Montoya, Andreas Triantafyllopoulos , et al. (1 additional authors not shown)

    Abstract: Currently available tools for the automated acoustic recognition of European insects in natural soundscapes are limited in scope. Large and ecologically heterogeneous acoustic datasets are currently needed for these algorithms to cross-contextually recognize the subtle and complex acoustic signatures produced by each species, thus making the availability of such datasets a key requisite for their… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 3 Figures + 2 Supplementary Figures, 2 Tables + 3 Supplementary Tables

  3. arXiv:2501.10525  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids

    Authors: Iosif Tsangko, Andreas Triantafyllopoulos, Michael Müller, Hendrik Schröter, Björn W. Schuller

    Abstract: The DeepFilterNet (DFN) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all' approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper it… ▽ More

    Submitted 23 January, 2025; v1 submitted 17 January, 2025; originally announced January 2025.

    Comments: Comments: Accepted at ICASSP 2025. 5 pages, 3 figures

    ACM Class: I.2.6; H.5.5; I.5.1; I.4.8

  4. arXiv:2412.11943  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks

    Authors: Simon Rampp, Andreas Triantafyllopoulos, Manuel Milling, Björn W. Schuller

    Abstract: This work introduces the key operating principles for autrainer, our new deep learning training framework for computer audition tasks. autrainer is a PyTorch-based toolkit that allows for rapid, reproducible, and easily extensible training on a variety of different computer audition tasks. Concretely, autrainer offers low-code training and supports a wide range of neural networks as well as prepro… ▽ More

    Submitted 10 April, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

  5. arXiv:2409.06451  [pdf, other

    cs.SD eess.AS

    Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models

    Authors: Xin Jing, Kun Zhou, Andreas Triantafyllopoulos, Björn W. Schuller

    Abstract: While current emotional text-to-speech (TTS) systems can generate highly intelligible emotional speech, achieving fine control over emotion rendering of the output speech still remains a significant challenge. In this paper, we introduce ParaEVITS, a novel emotional TTS framework that leverages the compositionality of natural language to enhance control over emotional rendering. By incorporating a… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  6. arXiv:2408.06264  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance

    Authors: Manuel Milling, Shuo Liu, Andreas Triantafyllopoulos, Ilhan Aslan, Björn W. Schuller

    Abstract: Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solu… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  7. Abusive Speech Detection in Indic Languages Using Acoustic Features

    Authors: Anika A. Spiesberger, Andreas Triantafyllopoulos, Iosif Tsangko, Björn W. Schuller

    Abstract: Abusive content in online social networks is a well-known problem that can cause serious psychological harm and incite hatred. The ability to upload audio data increases the importance of developing methods to detect abusive content in speech recordings. However, simply transferring the mechanisms from written abuse detection would ignore relevant information such as emotion and tone. In addition,… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Journal ref: Proc. INTERSPEECH 2023, 2683-2687

  8. arXiv:2407.15672  [pdf, other

    cs.SD eess.AS

    Computer Audition: From Task-Specific Machine Learning to Foundation Models

    Authors: Andreas Triantafyllopoulos, Iosif Tsangko, Alexander Gebhard, Annamaria Mesaros, Tuomas Virtanen, Björn Schuller

    Abstract: Foundation models (FMs) are increasingly spearheading recent advances on a variety of tasks that fall under the purview of computer audition -- the use of machines to understand sounds. They feature several advantages over traditional pipelines: among others, the ability to consolidate multiple tasks in a single model, the option to leverage knowledge from other modalities, and the readily-availab… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  9. arXiv:2406.07203  [pdf, other

    cs.SD eess.AS

    ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks

    Authors: Xin Jing, Andreas Triantafyllopoulos, Björn Schuller

    Abstract: Contrastive language-audio pretraining (CLAP) has recently emerged as a method for making audio analysis more generalisable. Specifically, CLAP-style models are able to `answer' a diverse set of language queries, extending the capabilities of audio models beyond a closed set of labels. However, CLAP relies on a large set of (audio, query) pairs for pretraining. While such sets are available for ge… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  10. Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines

    Authors: Philipp Wagner, Andreas Triantafyllopoulos, Alexander Gebhard, Björn Schuller

    Abstract: In recent decades, running has become an increasingly popular pastime activity due to its accessibility, ease of practice, and anticipated health benefits. However, the risk of running-related injuries is substantial for runners of different experience levels. Several common forms of injuries result from overuse -- extending beyond the recommended running time and intensity. Recently, audio-based… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at EUSIPCO 2024

  11. An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats

    Authors: Andreas Triantafyllopoulos, Alexander Gebhard, Manuel Milling, Simon Rampp, Björn Schuller

    Abstract: Prior work in computational bioacoustics has mostly focused on the detection of animal presence in a particular habitat. However, animal sounds contain much richer information than mere presence; among others, they encapsulate the interactions of those animals with other members of their species. Studying these interactions is almost impossible in a naturalistic setting, as the ground truth is oft… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at EUSIPCO 2024

  12. arXiv:2309.16369  [pdf, other

    cs.SD cs.LG eess.AS

    Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

    Authors: Manuel Milling, Andreas Triantafyllopoulos, Iosif Tsangko, Simon David Noel Rampp, Björn Wolfgang Schuller

    Abstract: The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the acoustic scene classification task of the DCASE2020 challenge data. Our analysis is based on two-dimensi… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication

  13. Exploring Meta Information for Audio-based Zero-shot Bird Classification

    Authors: Alexander Gebhard, Andreas Triantafyllopoulos, Teresa Bez, Lukas Christ, Alexander Kathan, Björn W. Schuller

    Abstract: Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This study investigates how meta-information can improve zero-shot audio classification, utilising bird species as an example case study due to the availability of rich… ▽ More

    Submitted 11 June, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  14. arXiv:2305.13195  [pdf, other

    cs.SD eess.AS

    U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech

    Authors: Xin Jing, Yi Chang, Zijiang Yang, Jiangjian Xie, Andreas Triantafyllopoulos, Bjoern W. Schuller

    Abstract: Deep learning has led to considerable advances in text-to-speech synthesis. Most recently, the adoption of Score-based Generative Models (SGMs), also known as Diffusion Probabilistic Models (DPMs), has gained traction due to their ability to produce high-quality synthesized neural speech in neural speech synthesis systems. In SGMs, the U-Net architecture and its variants have long dominated as the… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  15. arXiv:2304.14882  [pdf, other

    cs.SD cs.LG eess.AS

    The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

    Authors: Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice Baird, Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen, Nikola Lackovic, Marie-José Caraty, Claude Montacié

    Abstract: The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classi… ▽ More

    Submitted 1 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: 5 pages, part of the ACM Multimedia 2023 Grand Challenge "The ACM Multimedia 2023 Computational Paralinguistics Challenge (ComParE 2023). arXiv admin note: text overlap with arXiv:2205.06799

    MSC Class: 68 ACM Class: I.2.7; I.5.0; J.3

  16. arXiv:2301.10477  [pdf, other

    cs.SD cs.CY eess.AS

    HEAR4Health: A blueprint for making computer audition a staple of modern healthcare

    Authors: Andreas Triantafyllopoulos, Alexander Kathan, Alice Baird, Lukas Christ, Alexander Gebhard, Maurice Gerczuk, Vincent Karas, Tobias Hübner, Xin Jing, Shuo Liu, Adria Mallol-Ragolta, Manuel Milling, Sandra Ottl, Anastasia Semertzidou, Srividya Tirunellai Rajamani, Tianhao Yan, Zijiang Yang, Judith Dineley, Shahin Amiriparian, Katrin D. Bartl-Pokorny, Anton Batliner, Florian B. Pokorny, Björn W. Schuller

    Abstract: Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearable… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  17. arXiv:2209.07384  [pdf, other

    cs.SD cs.AI eess.AS

    Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts

    Authors: Vincent Karas, Andreas Triantafyllopoulos, Meishu Song, Björn W. Schuller

    Abstract: Vocal bursts play an important role in communicating affect, making them valuable for improving speech emotion recognition. Here, we present our approach for classifying vocal bursts and predicting their emotional significance in the ACII Affective Vocal Burst Workshop & Challenge 2022 (A-VB). We use a large self-supervised audio model as shared feature extractor and compare multiple architectures… ▽ More

    Submitted 27 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: 4 pages, 1 figure, accepted at The 2022 ACII Affective Vocal Burst Workshop & Challenge (A-VB)

  18. Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease

    Authors: Andreas Triantafyllopoulos, Markus Fendler, Anton Batliner, Maurice Gerczuk, Shahin Amiriparian, Thomas M. Berghaus, Björn W. Schuller

    Abstract: Chronic obstructive pulmonary disease (COPD) causes lung inflammation and airflow blockage leading to a variety of respiratory symptoms; it is also a leading cause of death and affects millions of individuals around the world. Patients often require treatment and hospitalisation, while no cure is currently available. As COPD predominantly affects the respiratory system, speech and non-linguistic v… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: Accepted in INTERSPEECH 2022

    Journal ref: Proc. Interspeech 2022, 3623-3627

  19. arXiv:2206.11049  [pdf, other

    cs.SD cs.LG eess.AS

    Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal Expression

    Authors: Meishu Song, Zijiang Yang, Andreas Triantafyllopoulos, Xin Jing, Vincent Karas, Xie Jiangjian, Zixing Zhang, Yamamoto Yoshiharu, Bjoern W. Schuller

    Abstract: We propose a novel Dynamic Restrained Uncertainty Weighting Loss to experimentally handle the problem of balancing the contributions of multiple tasks on the ICML ExVo 2022 Challenge. The multitask aims to recognize expressed emotions and demographic traits from vocal bursts jointly. Our strategy combines the advantages of Uncertainty Weight and Dynamic Weight Average, by extending weights with a… ▽ More

    Submitted 27 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: 5 pages

  20. arXiv:2206.11045  [pdf, other

    eess.AS cs.LG cs.SD

    COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

    Authors: Andreas Triantafyllopoulos, Anastasia Semertzidou, Meishu Song, Florian B. Pokorny, Björn W. Schuller

    Abstract: More than two years after its outbreak, the COVID-19 pandemic continues to plague medical systems around the world, putting a strain on scarce resources, and claiming human lives. From the very beginning, various AI-based COVID-19 detection and monitoring tools have been pursued in an attempt to stem the tide of infections through timely diagnosis. In particular, computer audition has been suggest… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  21. arXiv:2206.09142  [pdf, other

    cs.SD eess.AS

    Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression

    Authors: Xin Jing, Meishu Song, Andreas Triantafyllopoulos, Zijiang Yang, Björn W. Schuller

    Abstract: In this paper, we propose the Redundancy Reduction Twins Network (RRTN), a redundancy reduction training framework that minimizes redundancy by measuring the cross-correlation matrix between the outputs of the same network fed with distorted versions of a sample and bringing it as close to the identity matrix as possible. RRTN also applies a new loss function, the Barlow Twins loss function, to he… ▽ More

    Submitted 28 June, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: 5 pages, accepted by ICML Exvo workshop

  22. arXiv:2206.06680  [pdf, other

    cs.SD cs.LG eess.AS

    Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

    Authors: Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Xin Jing, Björn W. Schuller

    Abstract: In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction. The core contribution is an `enrolment' encoder which utilises two unlabelled samples of the target speaker to adjust the output of the emotion encoder; the adjustment is based on dot-product attention, thus effectively functioning as a form of `soft' feature selection. The emotion and enr… ▽ More

    Submitted 20 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Proceedings of the ICML Expressive Vocalizations Workshop and Competition held in conjunction with the $\mathit{39}^{th}$ International Conference on Machine Learning, Copyright 2022 by the author(s)

  23. arXiv:2205.04343  [pdf, other

    cs.SD cs.LG eess.AS

    Fatigue Prediction in Outdoor Running Conditions using Audio Data

    Authors: Andreas Triantafyllopoulos, Sandra Ottl, Alexander Gebhard, Esther Rituerto-González, Mirko Jaumann, Steffen Hüttner, Valerie Dieter, Patrick Schneeweiß, Inga Krauß, Maurice Gerczuk, Shahin Amiriparian, Björn W. Schuller

    Abstract: Although running is a common leisure activity and a core training regiment for several athletes, between $29\%$ and $79\%$ of runners sustain an overuse injury each year. These injuries are linked to excessive fatigue, which alters how someone runs. In this work, we explore the feasibility of modelling the Borg received perception of exertion (RPE) scale (range: $[6-20]$), a well-validated subject… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: Paper accepted at IEEE EMBC 2022. Rights remain with IEEE

  24. arXiv:2205.04328  [pdf, other

    cs.SD cs.LG eess.AS

    Insights on Modelling Physiological, Appraisal, and Affective Indicators of Stress using Audio Features

    Authors: Andreas Triantafyllopoulos, Sandra Zänkert, Alice Baird, Julian Konzok, Brigitte M. Kudielka, Björn W. Schuller

    Abstract: Stress is a major threat to well-being that manifests in a variety of physiological and mental symptoms. Utilising speech samples collected while the subject is undergoing an induced stress episode has recently shown promising results for the automatic characterisation of individual stress responses. In this work, we introduce new findings that shed light onto whether speech signals are suited to… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: Paper accepted for publication at IEEE EMBC 2022. Rights remain with IEEE

  25. arXiv:2203.17012  [pdf, other

    cs.SD cs.LG eess.AS

    A Temporal-oriented Broadcast ResNet for COVID-19 Detection

    Authors: Xin Jing, Shuo Liu, Emilia Parada-Cabaleiro, Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Björn W. Schuller

    Abstract: Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission. Due to the promising results of deep learning networks in modelling time sequences, and since applications to rapidly identify COVID in-the-wild should require low computational effort, we present a temporal-oriented broadcasting residual l… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: 5 pages,submitted to Intesspeech 2022

  26. arXiv:2203.07378  [pdf, other

    eess.AS cs.LG cs.SD

    Dawn of the transformer era in speech emotion recognition: closing the valence gap

    Authors: Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Felix Burkhardt, Florian Eyben, Björn W. Schuller

    Abstract: Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks. In the audio domain, such architectures have also been successfully utilised in the field of speech emotion recognition (SER). However, existing works have not evaluated the influence of model size and pre-training data on downstream perform… ▽ More

    Submitted 7 September, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

    Journal ref: in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10745-10759, 1 Sept. 2023

  27. arXiv:1805.01222  [pdf, ps, other

    cs.CV cs.LG cs.SD eess.AS stat.ML

    audEERING's approach to the One-Minute-Gradual Emotion Challenge

    Authors: Andreas Triantafyllopoulos, Hesam Sagha, Florian Eyben, Björn Schuller

    Abstract: This paper describes audEERING's submissions as well as additional evaluations for the One-Minute-Gradual (OMG) emotion recognition challenge. We provide the results for audio and video processing on subject (in)dependent evaluations. On the provided Development set, we achieved 0.343 Concordance Correlation Coefficient (CCC) for arousal (from audio) and .401 for valence (from video).

    Submitted 3 May, 2018; originally announced May 2018.

    Comments: 3 pages