Skip to main content

Showing 1–27 of 27 results for author: Peeters, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.03337  [pdf, other

    cs.SD eess.AS eess.SP stat.ML

    The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis

    Authors: Bernardo Torres, Geoffroy Peeters, Gael Richard

    Abstract: We introduce the Inverse Drum Machine (IDM), a novel approach to drum source separation that combines analysis-by-synthesis with deep learning. Unlike recent supervised methods that rely on isolated stems, IDM requires only transcription annotations. It jointly optimizes automatic drum transcription and one-shot drum sample synthesis in an end-to-end framework. By convolving synthesized one-shot s… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  2. arXiv:2503.19597  [pdf, other

    cs.SD eess.SP

    QINCODEC: Neural Audio Compression with Implicit Neural Codebooks

    Authors: Zineb Lahrichi, Gaëtan Hadjeres, Gael Richard, Geoffroy Peeters

    Abstract: Neural audio codecs, neural networks which compress a waveform into discrete tokens, play a crucial role in the recent development of audio generative models. State-of-the-art codecs rely on the end-to-end training of an autoencoder and a quantization bottleneck. However, this approach restricts the choice of the quantization methods as it requires to define how gradients propagate through the qua… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  3. Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning

    Authors: Aurian Quelennec, Pierre Chouteau, Geoffroy Peeters, Slim Essid

    Abstract: Recently, self-supervised learning methods based on masked latent prediction have proven to encode input data into powerful representations. However, during training, the learned latent space can be further transformed to extract higher-level information that could be more suited for downstream classification tasks. Therefore, we propose a new method: MAsked latenT Prediction And Classification (M… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: Copyright 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Journal ref: ICASSP 2025

  4. arXiv:2411.19806  [pdf, other

    cs.SD cs.AI eess.AS

    Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures

    Authors: Alain Riou, Antonin Gagneré, Gaëtan Hadjeres, Stefan Lattner, Geoffroy Peeters

    Abstract: In this paper, we tackle the task of musical stem retrieval. Given a musical mix, it consists in retrieving a stem that would fit with it, i.e., that would sound pleasant if played together. To do so, we introduce a new method based on Joint-Embedding Predictive Architectures, where an encoder and a predictor are jointly trained to produce latent representations of a context and predict latent rep… ▽ More

    Submitted 24 February, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

    Comments: Accepted to the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)

  5. arXiv:2411.04152  [pdf, other

    eess.AS cs.SD

    A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning

    Authors: Antonin Gagnere, Geoffroy Peeters, Slim Essid

    Abstract: In this paper, we propose a novel Self-Supervised-Learning scheme to train rhythm analysis systems and instantiate it for few-shot beat tracking. Taking inspiration from the Contrastive Predictive Coding paradigm, we propose to train a Log-Mel-Spectrogram Transformer encoder to contrast observations at times separated by hypothesized beat intervals from those that are not. We do this without the k… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Journal ref: ISMIR 2024, Nov 2024, San Francisco, Californ, United States

  6. arXiv:2410.05302  [pdf, other

    eess.AS cs.LG cs.MM cs.SD eess.SP

    Episodic fine-tuning prototypical networks for optimization-based few-shot learning: Application to audio classification

    Authors: Xuanyu Zhuang, Geoffroy Peeters, Gaël Richard

    Abstract: The Prototypical Network (ProtoNet) has emerged as a popular choice in Few-shot Learning (FSL) scenarios due to its remarkable performance and straightforward implementation. Building upon such success, we first propose a simple (yet novel) method to fine-tune a ProtoNet on the (labeled) support set of the test episode of a C-way-K-shot test episode (without using the query set which is only used… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Accepted at MLSP 2024

    Journal ref: 2024 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2024), Sep 2024, London (UK), United Kingdom

  7. arXiv:2408.02514  [pdf, other

    cs.SD cs.LG eess.AS

    Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation

    Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Michael Anslow, Geoffroy Peeters

    Abstract: This paper explores the automated process of determining stem compatibility by identifying audio recordings of single instruments that blend well with a given musical context. To tackle this challenge, we present Stem-JEPA, a novel Joint-Embedding Predictive Architecture (JEPA) trained on a multi-track dataset using a self-supervised learning approach. Our model comprises two networks: an encode… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Proceedings of the 25th International Society for Music Information Retrieval Conference, ISMIR 2024

  8. arXiv:2405.08679  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

    Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

    Abstract: This paper addresses the problem of self-supervised general-purpose audio representation learning. We explore the use of Joint-Embedding Predictive Architectures (JEPA) for this task, which consists of splitting an input mel-spectrogram into two parts (context and target), computing neural representations for each, and training the neural network to predict the target representations from the cont… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Self-supervision in Audio, Speech and Beyond workshop, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024

  9. arXiv:2312.14507  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport

    Authors: Bernardo Torres, Geoffroy Peeters, Gaël Richard

    Abstract: In neural audio signal processing, pitch conditioning has been used to enhance the performance of synthesizers. However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory th… ▽ More

    Submitted 15 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted in ICASSP 2024

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul, South Korea

  10. arXiv:2312.14005  [pdf, ps, other

    cs.SD cs.AI eess.AS

    On the choice of the optimal temporal support for audio classification with Pre-trained embeddings

    Authors: Aurian Quelennec, Michel Olvera, Geoffroy Peeters, Slim Essid

    Abstract: Current state-of-the-art audio analysis systems rely on pre-trained embedding models, often used off-the-shelf as (frozen) feature extractors. Choosing the best one for a set of tasks is the subject of many recent publications. However, one aspect often overlooked in these works is the influence of the duration of audio input considered to extract an embedding, which we refer to as Temporal Suppor… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  11. arXiv:2310.11781  [pdf, other

    cs.SD eess.AS

    Blind estimation of audio effects using an auto-encoder approach and differentiable digital signal processing

    Authors: Côme Peladeau, Geoffroy Peeters

    Abstract: Blind Estimation of Audio Effects (BE-AFX) aims at estimating the Audio Effects (AFXs) applied to an original, unprocessed audio sample solely based on the processed audio sample. To train such a system traditional approaches optimize a loss between ground truth and estimated AFX parameters. This involves knowing the exact implementation of the AFXs used for the process. In this work, we propose a… ▽ More

    Submitted 9 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

  12. arXiv:2309.02265  [pdf, other

    eess.AS cs.SD

    PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective

    Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

    Abstract: In this paper, we address the problem of pitch estimation using Self Supervised Learning (SSL). The SSL paradigm we use is equivariance to pitch transposition, which enables our model to accurately perform pitch estimation on monophonic audio after being trained only on a small unlabeled dataset. We use a lightweight ($<$ 30k parameters) Siamese neural network that takes as inputs two different pi… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  13. arXiv:2309.02243  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Similarity-Based and Novelty-based loss for music structure analysis

    Authors: Geoffroy Peeters

    Abstract: Music Structure Analysis (MSA) is the task aiming at identifying musical segments that compose a music track and possibly label them based on their similarity. In this paper we propose a supervised approach for the task of music boundary detection. In our approach we simultaneously learn features and convolution kernels. For this we jointly optimize -- a loss based on the Self-Similarity-Matrix (S… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  14. arXiv:2306.07187  [pdf, other

    cs.MM cs.IR cs.LG cs.SD eess.AS

    Video-to-Music Recommendation using Temporal Alignment of Segments

    Authors: Laure Prétet, Gaël Richard, Clément Souchier, Geoffroy Peeters

    Abstract: We study cross-modal recommendation of music tracks to be used as soundtracks for videos. This problem is known as the music supervision task. We build on a self-supervised system that learns a content association between music and video. In addition to the adequacy of content, adequacy of structure is crucial in music supervision to obtain relevant recommendations. We propose a novel approach to… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Journal ref: IEEE Transactions on Multimedia, 18 February 2022

  15. arXiv:2211.08141  [pdf, other

    cs.SD cs.LG eess.AS

    SSM-Net: feature learning for Music Structure Analysis using a Self-Similarity-Matrix based loss

    Authors: Geoffroy Peeters, Florian Angulo

    Abstract: In this paper, we propose a new paradigm to learn audio features for Music Structure Analysis (MSA). We train a deep encoder to learn features such that the Self-Similarity-Matrix (SSM) resulting from those approximates a ground-truth SSM. This is done by minimizing a loss between both SSMs. Since this loss is differentiable w.r.t. its input features we can train the encoder in a straightforward w… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: Extended Abstracts for the Late-Breaking Demo Session of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022

  16. arXiv:2211.07250  [pdf, other

    cs.SD cs.LG eess.AS

    Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts

    Authors: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, Gaël Richard

    Abstract: As music has become more available especially on music streaming platforms, people have started to have distinct preferences to fit to their varying listening situations, also known as context. Hence, there has been a growing interest in considering the user's situation when recommending music to users. Previous works have proposed user-aware autotaggers to infer situation-related tags from music… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Published in ISMIR

  17. arXiv:2202.09198  [pdf, other

    cs.SD cs.LG eess.AS

    Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation

    Authors: Christof Weiß, Geoffroy Peeters

    Abstract: Extracting pitch information from music recordings is a challenging but important problem in music signal processing. Frame-wise transcription or multi-pitch estimation aims for detecting the simultaneous activity of pitches in polyphonic music recordings and has recently seen major improvements thanks to deep-learning techniques, with a variety of proposed network architectures. In this paper, we… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

  18. arXiv:2108.00970  [pdf, other

    cs.MM

    Is there a "language of music-video clips" ? A qualitative and quantitative study

    Authors: Laure Prétet, Gaël Richard, Geoffroy Peeters

    Abstract: Recommending automatically a video given a music or a music given a video has become an important asset for the audiovisual industry - with user-generated or professional content. While both music and video have specific temporal organizations, most current works do not consider those and only focus on globally recommending a media. As a first step toward the improvement of these recommendation sy… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

  19. arXiv:2104.14799  [pdf, other

    cs.MM

    Cross-Modal Music-Video Recommendation: A Study of Design Choices

    Authors: Laure Pretet, Gael Richard, Geoffroy Peeters

    Abstract: In this work, we study music/video cross-modal recommendation, i.e. recommending a music track for a video or vice versa. We rely on a self-supervised learning paradigm to learn from a large amount of unlabelled data. We rely on a self-supervised learning paradigm to learn from a large amount of unlabelled data. More precisely, we jointly learn audio and video embeddings by using their co-occurren… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

  20. arXiv:2008.02070  [pdf, other

    eess.AS cs.LG cs.SD

    Content based singing voice source separation via strong conditioning using aligned phonemes

    Authors: Gabriel Meseguer-Brocal, Geoffroy Peeters

    Abstract: Informed source separation has recently gained renewed interest with the introduction of neural networks and the availability of large multitrack datasets containing both the mixture and the separated sources. These approaches use prior information about the target source to improve separation. Historically, Music Information Retrieval researchers have focused primarily on score-informed source se… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: 21st International Society for Music Information Retrieval Conference 11-15 October 2020, Montreal, Canada

  21. arXiv:2005.12977  [pdf, other

    cs.IR cs.CV cs.SD eess.AS

    Learning to rank music tracks using triplet loss

    Authors: Laure Prétet, Gaël Richard, Geoffroy Peeters

    Abstract: Most music streaming services rely on automatic recommendation algorithms to exploit their large music catalogs. These algorithms aim at retrieving a ranked list of music tracks based on their similarity with a target music track. In this work, we propose a method for direct recommendation based on the audio content without explicitly tagging the music tracks. To that aim, we propose several strat… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

  22. arXiv:1910.09862  [pdf, other

    cs.LG cs.SD stat.ML

    A Prototypical Triplet Loss for Cover Detection

    Authors: Guillaume Doras, Geoffroy Peeters

    Abstract: Automatic cover detection -- the task of finding in a audio dataset all covers of a query track -- has long been a challenging theoretical problem in MIR community. It also became a practical need for music composers societies requiring to detect automatically if an audio excerpt embeds musical content belonging to their catalog. In a recent work, we addressed this problem with a convolutional n… ▽ More

    Submitted 9 April, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Corrections after reviewers comments. Correct erroneous figure 5 in original version

  23. arXiv:1907.01824  [pdf, other

    cs.SD cs.LG stat.ML

    Cover Detection using Dominant Melody Embeddings

    Authors: Guillaume Doras, Geoffroy Peeters

    Abstract: Automatic cover detection -- the task of finding in an audio database all the covers of one or several query tracks -- has long been seen as a challenging theoretical problem in the MIR community and as an acute practical problem for authors and composers societies. Original algorithms proposed for this task have proven their accuracy on small datasets, but are unable to scale up to modern real-li… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Journal ref: 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 2019

  24. arXiv:1907.01277  [pdf, other

    eess.AS cs.LG cs.SD

    Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations

    Authors: Gabriel Meseguer-Brocal, Geoffroy Peeters

    Abstract: Data-driven models for audio source separation such as U-Net or Wave-U-Net are usually models dedicated to and specifically trained for a single task, e.g. a particular instrument isolation. Training them for various tasks at once commonly results in worse performances than training them for a single specialized task. In this work, we introduce the Conditioned-U-Net (C-U-Net) which adds a control… ▽ More

    Submitted 21 November, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

    Journal ref: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR, Delft, Netherlands, 2019

  25. arXiv:1906.10606  [pdf, other

    eess.AS cs.DB cs.LG cs.SD

    DALI: a large Dataset of synchronized Audio, LyrIcs and notes, automatically created using teacher-student machine learning paradigm

    Authors: Gabriel Meseguer-Brocal, Alice Cohen-Hadria, Geoffroy Peeters

    Abstract: The goal of this paper is twofold. First, we introduce DALI, a large and rich multimodal dataset containing 5358 audio tracks with their time-aligned vocal melody notes and lyrics at four levels of granularity. The second goal is to explain our methodology where dataset creation and learning models interact using a teacher-student machine learning paradigm that benefits each other. We start with a… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

    Journal ref: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR, Paris, France, pp. 431-437, 2018

  26. arXiv:1903.01415  [pdf, other

    cs.SD eess.AS

    Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation

    Authors: Alice Cohen-Hadria, Axel Roebel, Geoffroy Peeters

    Abstract: State-of-the-art singing voice separation is based on deep learning making use of CNN structures with skip connections (like U-net model, Wave-U-Net model, or MSDENSELSTM). A key to the success of these models is the availability of a large amount of training data. In the following study, we are interested in singing voice separation for mono signals and will investigate into comparing the U-Net a… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

    Journal ref: Published in Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 2019

  27. arXiv:1805.01201  [pdf, ps, other

    cs.SD eess.AS

    Single-Channel Blind Source Separation for Singing Voice Detection: A Comparative Study

    Authors: Dominique Fourer, Geoffroy Peeters

    Abstract: We propose a novel unsupervised singing voice detection method which use single-channel Blind Audio Source Separation (BASS) algorithm as a preliminary step. To reach this goal, we investigate three promising BASS approaches which operate through a morphological filtering of the analyzed mixture spectrogram. The contributions of this paper are manyfold. First, the investigated BASS methods are rew… ▽ More

    Submitted 3 May, 2018; originally announced May 2018.