Skip to main content

Showing 1–6 of 6 results for author: Ferri, F J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.11312  [pdf, other

    eess.SP cs.SD eess.AS

    A Data-Driven Exploration of Elevation Cues in HRTFs: An Explainable AI Perspective Across Multiple Datasets

    Authors: Juan Antonio De Rus, Mario Montagud, Jesus Lopez-Ballester, Francesc J. Ferri, Maximo Cobos

    Abstract: Precise elevation perception in binaural audio remains a challenge, despite extensive research on head-related transfer functions (HRTFs) and spectral cues. While prior studies have advanced our understanding of sound localization cues, the interplay between spectral features and elevation perception is still not fully understood. This paper presents a comprehensive analysis of over 600 subjects f… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 14 pages, 9 figures

  2. arXiv:2107.14658  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Task 1A DCASE 2021: Acoustic Scene Classification with mismatch-devices using squeeze-excitation technique and low-complexity constraint

    Authors: Javier Naranjo-Alcazar, Sergi Perez-Castanos, Maximo Cobos, Francesc J. Ferri, Pedro Zuccarello

    Abstract: Acoustic scene classification (ASC) is one of the most popular problems in the field of machine listening. The objective of this problem is to classify an audio clip into one of the predefined scenes using only the audio data. This problem has considerably progressed over the years in the different editions of DCASE. It usually has several subtasks that allow to tackle this problem with different… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

    Comments: Submitted to Task 1a DCASE 2021 Challenge

  3. arXiv:2107.14561  [pdf, other

    cs.SD cs.LG eess.AS

    TASK3 DCASE2021 Challenge: Sound event localization and detection using squeeze-excitation residual CNNs

    Authors: Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarello, Francesc J. Ferri, Maximo Cobos

    Abstract: Sound event localisation and detection (SELD) is a problem in the field of automatic listening that aims at the temporal detection and localisation (direction of arrival estimation) of sound events within an audio clip, usually of long duration. Due to the amount of data present in the datasets related to this problem, solutions based on deep learning have positioned themselves at the top of the s… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

    Comments: Submitted to Task3 DCASE Challenge 2021

  4. arXiv:2107.13180  [pdf, other

    cs.MM cs.CV cs.SD eess.AS eess.IV

    Squeeze-Excitation Convolutional Recurrent Neural Networks for Audio-Visual Scene Classification

    Authors: Javier Naranjo-Alcazar, Sergi Perez-Castanos, Aaron Lopez-Garcia, Pedro Zuccarello, Maximo Cobos, Francesc J. Ferri

    Abstract: The use of multiple and semantically correlated sources can provide complementary information to each other that may not be evident when working with individual modalities on their own. In this context, multi-modal models can help producing more accurate and robust predictions in machine learning tasks where audio-visual data is available. This paper presents a multi-modal model for automatic scen… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

  5. arXiv:2002.11561  [pdf, other

    cs.SD cs.LG eess.AS

    An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments

    Authors: Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarrello, Ana M. Torres, Jose J. Lopez, Franscesc J. Ferri, Maximo Cobos

    Abstract: The problem of training with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications include those related to face… ▽ More

    Submitted 11 April, 2022; v1 submitted 26 February, 2020; originally announced February 2020.

    Comments: Submitted to IEEEAccess

  6. arXiv:1906.04591  [pdf, ps, other

    cs.SD cs.LG eess.AS

    CNN depth analysis with different channel inputs for Acoustic Scene Classification

    Authors: Sergi Perez-Castanos, Javier Naranjo-Alcazar, Pedro Zuccarello, Maximo Cobos, Frances J. Ferri

    Abstract: Acoustic scene classification (ASC) has been approached in the last years using deep learning techniques such as convolutional neural networks or recurrent neural networks. Many state-of-the-art solutions are based on image classification frameworks and, as such, a 2D representation of the audio signal is considered for training these networks. Finding the most suitable audio representation is sti… ▽ More

    Submitted 13 August, 2021; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: Accepted at URSI 2020, Malaga, Spain