Skip to main content

Showing 1–9 of 9 results for author: Kegler, M

.
  1. arXiv:2403.14246  [pdf, other

    eess.AS cs.AI

    CATSE: A Context-Aware Framework for Causal Target Sound Extraction

    Authors: Shrishail Baligar, Mikolaj Kegler, Bryce Irvin, Marko Stamenovic, Shawn Newsam

    Abstract: Target Sound Extraction (TSE) focuses on the problem of separating sources of interest, indicated by a user's cue, from the input mixture. Most existing solutions operate in an offline fashion and are not suited to the low-latency causal processing constraints imposed by applications in live-streamed content such as augmented hearing. We introduce a family of context-aware low-latency causal TSE m… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO 2024

  2. arXiv:2403.12182  [pdf, other

    eess.AS

    Latent CLAP Loss for Better Foley Sound Synthesis

    Authors: Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic

    Abstract: Foley sound generation, the art of creating audio for multimedia, has recently seen notable advancements through text-conditioned latent diffusion models. These systems use multimodal text-audio representation models, such as Contrastive Language-Audio Pretraining (CLAP), whose objective is to map corresponding audio and text prompts into a joint embedding space. AudioLDM, a text-to-audio model, w… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Journal ref: EUSIPCO 2024 Proceedings, ISBN: 978-9-4645-9361-7

  3. arXiv:2309.08144  [pdf, other

    cs.SD cs.LG eess.AS

    Two-Step Knowledge Distillation for Tiny Speech Enhancement

    Authors: Rayan Daod Nathoo, Mikolaj Kegler, Marko Stamenovic

    Abstract: Tiny, causal models are crucial for embedded audio machine learning applications. Model compression can be achieved via distilling knowledge from a large teacher into a smaller student model. In this work, we propose a novel two-step approach for tiny speech enhancement model distillation. In contrast to the standard approach of a weighted mixture of distillation and supervised losses, we firstly… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Under review ICASSP 2024

  4. arXiv:2211.02542  [pdf, other

    eess.AS cs.CL cs.LG cs.SD eess.SP

    Self-Supervised Learning for Speech Enhancement through Synthesis

    Authors: Bryce Irvin, Marko Stamenovic, Mikolaj Kegler, Li-Chia Yang

    Abstract: Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction. In contrast, some recent works explore SE via generative speech synthesis, where the system's output is synthesized by a neural vocoder after an inherently lossy feature-denoising step. In this paper, we propose a denoisin… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  5. arXiv:2206.12038  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

    Authors: Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos Cernak

    Abstract: Methods for extracting audio and speech features have been studied since pioneering work on spectrum analysis decades ago. Recent efforts are guided by the ambition to develop general-purpose audio representations. For example, deep neural networks can extract optimal embeddings if they are trained on large audio datasets. This work extends existing methods based on self-supervised learning by boo… ▽ More

    Submitted 25 October, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

    Comments: Submitted to HEAR-PMLR 2021

  6. arXiv:2203.16637  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load

    Authors: Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

    Abstract: As a neurophysiological response to threat or adverse conditions, stress can affect cognition, emotion and behaviour with potentially detrimental effects on health in the case of sustained exposure. Since the affective content of speech is inherently modulated by an individual's physical and mental state, a substantial body of research has been devoted to the study of paralinguistic correlates of… ▽ More

    Submitted 30 June, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Submitted to InterSpeech 2022

  7. arXiv:2110.03414  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    SERAB: A multi-lingual benchmark for speech emotion recognition

    Authors: Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

    Abstract: Recent developments in speech emotion recognition (SER) often leverage deep neural networks (DNNs). Comparing and benchmarking different DNN models can often be tedious due to the use of different datasets and evaluation protocols. To facilitate the process, here, we present the Speech Emotion Recognition Adaptation Benchmark (SERAB), a framework for evaluating the performance and generalization c… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  8. Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing

    Authors: Pierre Beckmann, Mikolaj Kegler, Milos Cernak

    Abstract: Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech recognition. Up to date, most of these approaches are task-specific and designed for within-task transfer learning between different datasets or setups of a parti… ▽ More

    Submitted 14 December, 2021; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Published at EUSIPCO 2021

    Journal ref: 2021 29th European Signal Processing Conference (EUSIPCO), pp. 446-450

  9. Deep speech inpainting of time-frequency masks

    Authors: Mikolaj Kegler, Pierre Beckmann, Milos Cernak

    Abstract: Transient loud intrusions, often occurring in noisy environments, can completely overpower speech signal and lead to an inevitable loss of information. While existing algorithms for noise suppression can yield impressive results, their efficacy remains limited for very low signal-to-noise ratios or when parts of the signal are missing. To address these limitations, here we propose an end-to-end fr… ▽ More

    Submitted 29 August, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

    Comments: Accepted to InterSpeech2020

    Journal ref: Proc. Interspeech 2020, 3276-3280