Skip to main content

Showing 1–8 of 8 results for author: Alonso-Jimenez, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.06936  [pdf, ps, other

    cs.SD eess.AS

    Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets

    Authors: Pedro Ramoneda, Pablo Alonso-Jiménez, Sergio Oramas, Xavier Serra, Dmitry Bogdanov

    Abstract: Music autotagging aims to automatically assign descriptive tags, such as genre, mood, or instrumentation, to audio recordings. Due to its challenges, diversity of semantic descriptions, and practical value in various applications, it has become a common downstream task for evaluating the performance of general-purpose music representations learned from audio data. We introduce a new benchmarking d… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  2. arXiv:2507.03482  [pdf, ps, other

    cs.SD eess.AS

    OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token Prediction

    Authors: Pablo Alonso-Jiménez, Pedro Ramoneda, R. Oguz Araz, Andrea Poltronieri, Dmitry Bogdanov

    Abstract: Developing open-source foundation models is essential for advancing research in music audio understanding and ensuring access to powerful, multipurpose representations for music information retrieval. We present OMAR-RQ, a model trained with self-supervision via masked token classification methodologies using a large-scale dataset with over 330,000 hours of music audio. We experiment with differen… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

  3. arXiv:2402.09318  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

    Authors: Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, Martín Rocamora

    Abstract: We present PECMAE, an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing represe… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  4. arXiv:2312.05994  [pdf, other

    cs.SD cs.IR eess.AS

    mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks

    Authors: Christos Plachouras, Pablo Alonso-Jiménez, Dmitry Bogdanov

    Abstract: Music Information Retrieval (MIR) research is increasingly leveraging representation learning to obtain more compact, powerful music audio representations for various downstream MIR tasks. However, current representation evaluation methods are fragmented due to discrepancies in audio and label preprocessing, downstream model and metric implementations, data availability, and computational resource… ▽ More

    Submitted 12 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Machine Learning for Audio Workshop, Neural Information Processing Systems (NeurIPS) 2023, New Orleans, LA

  5. arXiv:2309.16418  [pdf, other

    cs.SD eess.AS

    Efficient Supervised Training of Audio Transformers for Music Representation Learning

    Authors: Pablo Alonso-Jiménez, Xavier Serra, Dmitry Bogdanov

    Abstract: In this work, we address music representation learning using convolution-free transformers. We build on top of existing spectrogram-based audio transformers such as AST and train our models on a supervised task using patchout training similar to PaSST. In contrast to previous works, we study how specific design decisions affect downstream music tagging tasks instead of focusing on the training tas… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted at the 2023 International Society for Music Information Retrieval Conference (ISMIR'23)

  6. arXiv:2304.12257  [pdf, other

    cs.SD eess.AS

    Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity

    Authors: Pablo Alonso-Jiménez, Xavier Favory, Hadrien Foroughmand, Grigoris Bourdalas, Xavier Serra, Thomas Lidy, Dmitry Bogdanov

    Abstract: In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to us… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'23)

  7. Multilabel Prototype Generation for Data Reduction in k-Nearest Neighbour classification

    Authors: Jose J. Valero-Mas, Antonio Javier Gallego, Pablo Alonso-Jiménez, Xavier Serra

    Abstract: Prototype Generation (PG) methods are typically considered for improving the efficiency of the $k$-Nearest Neighbour ($k$NN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in multiclass scenarios, very few works have addr… ▽ More

    Submitted 20 March, 2025; v1 submitted 22 July, 2022; originally announced July 2022.

    Journal ref: Pattern Recognition, Vol. 135, 2023

  8. arXiv:2003.07393  [pdf, ps, other

    eess.AS cs.LG cs.SD

    TensorFlow Audio Models in Essentia

    Authors: Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra

    Abstract: Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pr… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.