Skip to main content

Showing 1–11 of 11 results for author: Eghbal-Zadeh, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2211.13956  [pdf, other

    cs.SD cs.LG eess.AS

    Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers

    Authors: Khaled Koutini, Shahed Masoudian, Florian Schmid, Hamid Eghbal-zadeh, Jan Schlüter, Gerhard Widmer

    Abstract: The success of supervised deep learning methods is largely due to their ability to learn relevant features from raw data. Deep Neural Networks (DNNs) trained on large-scale datasets are capable of capturing a diverse set of features, and learning a representation that can generalize onto unseen tasks and datasets that are from the same domain. Hence, these models can be used as powerful feature ex… ▽ More

    Submitted 2 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: will apear in HEAR: Holistic Evaluation of Audio Representations Proceedings of Machine Learning Research PMLR 166. Source code: https://github.com/kkoutini/passt_hear21

    Journal ref: Proceedings of Machine Learning Research v166 (2022) 65-89

  2. Efficient Training of Audio Transformers with Patchout

    Authors: Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, Gerhard Widmer

    Abstract: The great success of transformer-based models in natural language processing (NLP) has led to various attempts at adapting these architectures to other domains such as vision and audio. Recent work has shown that transformers can outperform Convolutional Neural Networks (CNNs) on vision and audio tasks. However, one of the main shortcomings of transformer models, compared to the well-established C… ▽ More

    Submitted 29 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Submitted to Interspeech 2022. Source code: https://github.com/kkoutini/PaSST

  3. arXiv:2107.08933  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Over-Parameterization and Generalization in Audio Classification

    Authors: Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schlüter, Gerhard Widmer

    Abstract: Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: Presented at the ICML 2021 Workshop on Overparameterization: Pitfalls & Opportunities

  4. arXiv:2105.12395  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

    Authors: Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer

    Abstract: In this paper, we study the performance of variants of well-known Convolutional Neural Network (CNN) architectures on different audio tasks. We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization. An insufficient RF limits the CNN's ability to fit the training data. In contrast, CNNs with an excessive RF tend to over-fit the training data and fail to generalize to… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

    Comments: Accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processing. Code available: https://github.com/kkoutini/cpjku_dcase20

  5. arXiv:2007.13503  [pdf, other

    eess.AS cs.LG cs.SD

    Receptive-Field Regularized CNNs for Music Classification and Tagging

    Authors: Khaled Koutini, Hamid Eghbal-Zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, Gerhard Widmer

    Abstract: Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on l… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

  6. arXiv:1911.05833  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Emotion and Theme Recognition in Music with Frequency-Aware RF-Regularized CNNs

    Authors: Khaled Koutini, Shreyan Chowdhury, Verena Haunschmid, Hamid Eghbal-zadeh, Gerhard Widmer

    Abstract: We present CP-JKU submission to MediaEval 2019; a Receptive Field-(RF)-regularized and Frequency-Aware CNN approach for tagging music with emotion/mood labels. We perform an investigation regarding the impact of the RF of the CNNs on their performance on this dataset. We observe that ResNets with smaller receptive fields -- originally adapted for acoustic scene classification -- also perform well… ▽ More

    Submitted 28 October, 2019; originally announced November 2019.

    Comments: MediaEval`19, 27-29 October 2019, Sophia Antipolis, France

  7. arXiv:1909.02869  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

    Authors: Paul Primus, Hamid Eghbal-zadeh, David Eitelsebner, Khaled Koutini, Andreas Arzt, Gerhard Widmer

    Abstract: Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning. We study this problem in the context of machine listening (Task 1b of the DCASE 2019 Challenge). We propose a novel approach to learn domain-invariant classifiers in an end-to-end fashion by enforcing equal hidden layer representations for domain-… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: Published at the Workshop on Detection and Classification of Acoustic Scenes and Events, 25-26 October 2019, New York, USA

  8. arXiv:1909.02859  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Receptive-field-regularized CNN variants for acoustic scene classification

    Authors: Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer

    Abstract: Acoustic scene classification and related tasks have been dominated by Convolutional Neural Networks (CNNs). Top-performing CNNs use mainly audio spectograms as input and borrow their architectural design primarily from computer vision. A recent study has shown that restricting the receptive field (RF) of CNNs in appropriate ways is crucial for their performance, robustness and generalization in a… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

    Comments: Accepted at Detection and Classification of Acoustic Scenes and Events 2019 (DCASE Workshop 2019)

  9. arXiv:1907.01803  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification

    Authors: Khaled Koutini, Hamid Eghbal-zadeh, Matthias Dorfer, Gerhard Widmer

    Abstract: Convolutional Neural Networks (CNNs) have had great success in many machine vision as well as machine audition tasks. Many image recognition network architectures have consequently been adapted for audio processing tasks. However, despite some successes, the performance of many of these did not translate from the image to the audio domain. For example, very deep architectures such as ResNet and De… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: IEEE EUSIPCO 2019

  10. arXiv:1807.10501  [pdf, ps, other

    cs.SD eess.AS

    Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments

    Authors: Romain Serizel, Nicolas Turpault, Hamid Eghbal-Zadeh, Ankit Parag Shah

    Abstract: This paper presents DCASE 2018 task 4. The task evaluates systems for the large-scale detection of sound events using weakly labeled data (without time boundaries). The target of the systems is to provide not only the event class but also the event time boundaries given that multiple events can be present in an audio recording. Another challenge of the task is to explore the possibility to exploit… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

  11. arXiv:1711.04022  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Deep Within-Class Covariance Analysis for Robust Audio Representation Learning

    Authors: Hamid Eghbal-zadeh, Matthias Dorfer, Gerhard Widmer

    Abstract: Convolutional Neural Networks (CNNs) can learn effective features, though have been shown to suffer from a performance drop when the distribution of the data changes from training to test data. In this paper we analyze the internal representations of CNNs and observe that the representations of unseen data in each class, spread more (with higher variance) in the embedding space of the CNN compared… ▽ More

    Submitted 30 November, 2018; v1 submitted 10 November, 2017; originally announced November 2017.

    Comments: 11 pages, 3 tables, 4 figures