Skip to main content

Showing 1–6 of 6 results for author: Morocutti, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2503.11373  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models

    Authors: Tobias Morocutti, Florian Schmid, Jonathan Greif, Francesco Foscarin, Gerhard Widmer

    Abstract: We target the problem of developing new low-complexity networks for the sound event detection task. Our goal is to meticulously analyze the performance-complexity trade-off, aiming to be competitive with the large state-of-the-art models, at a fraction of the computational requirements. We find that low-complexity convolutional models previously proposed for audio tagging can be effectively adapte… ▽ More

    Submitted 12 June, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: In Proceedings of the 33rd European Signal Processing Conference (EUSIPCO 2025), Palermo, Italy

  2. arXiv:2503.11363  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification

    Authors: Tobias Morocutti, Florian Schmid, Khaled Koutini, Gerhard Widmer

    Abstract: Knowledge Distillation (KD) is a widespread technique for compressing the knowledge of large models into more compact and efficient models. KD has proved to be highly effective in building well-performing low-complexity Acoustic Scene Classification (ASC) systems and was used in all the top-ranked submissions to this task of the annual DCASE challenge in the past three years. There is extensive re… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  3. arXiv:2409.09546  [pdf, other

    eess.AS cs.SD

    Effective Pre-Training of Audio Transformers for Sound Event Detection

    Authors: Florian Schmid, Tobias Morocutti, Francesco Foscarin, Jan Schlüter, Paul Primus, Gerhard Widmer

    Abstract: We propose a pre-training pipeline for audio spectrogram transformers for frame-level sound event detection tasks. On top of common pre-training steps, we add a meticulously designed training routine on AudioSet frame-level annotations. This includes a balanced sampler, aggressive data augmentation, and ensemble knowledge distillation. For five transformers, we obtain a substantial performance imp… ▽ More

    Submitted 28 November, 2024; v1 submitted 14 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP'25. Source code available: https://github.com/fschmid56/PretrainedSED

  4. arXiv:2408.00791  [pdf, other

    eess.AS cs.SD

    Improving Audio Spectrogram Transformers for Sound Event Detection Through Multi-Stage Training

    Authors: Florian Schmid, Paul Primus, Tobias Morocutti, Jonathan Greif, Gerhard Widmer

    Abstract: This technical report describes the CP-JKU team's submission for Task 4 Sound Event Detection with Heterogeneous Training Datasets and Potentially Missing Labels of the DCASE 24 Challenge. We fine-tune three large Audio Spectrogram Transformers, PaSST, BEATs, and ATST, on the joint DESED and MAESTRO datasets in a two-stage training procedure. The first stage closely matches the baseline system set… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: Technical Report describing our system for DCASE2024 Challenge Task 4: https://dcase.community/challenge2024/task-sound-event-detection-with-heterogeneous-training-dataset-and-potentially-missing-labels-results Code: https://github.com/CPJKU/cpjku_dcase24. arXiv admin note: text overlap with arXiv:2407.12997

  5. arXiv:2407.12997  [pdf, other

    eess.AS

    Multi-Iteration Multi-Stage Fine-Tuning of Transformers for Sound Event Detection with Heterogeneous Datasets

    Authors: Florian Schmid, Paul Primus, Tobias Morocutti, Jonathan Greif, Gerhard Widmer

    Abstract: A central problem in building effective sound event detection systems is the lack of high-quality, strongly annotated sound event datasets. For this reason, Task 4 of the DCASE 2024 challenge proposes learning from two heterogeneous datasets, including audio clips labeled with varying annotation granularity and with different sets of possible events. We propose a multi-iteration, multi-stage proce… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/CPJKU/cpjku_dcase24

  6. arXiv:2305.07499  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

    Authors: Tobias Morocutti, Florian Schmid, Khaled Koutini, Gerhard Widmer

    Abstract: The ability to generalize to a wide range of recording devices is a crucial performance factor for audio classification models. The characteristics of different types of microphones introduce distributional shifts in the digitized audio signals due to their varying frequency responses. If this domain shift is not taken into account during training, the model's performance could degrade severely wh… ▽ More

    Submitted 27 June, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: In Proceedings of the 31st European Signal Processing Conference, EUSIPCO 2023. Source Code available at: https://github.com/theMoro/DIRAugmentation/