Skip to main content

Showing 1–8 of 8 results for author: Labbé, E

Searching in archive cs. Search in all archives.
.
  1. Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging

    Authors: Ludovic Tuncay, Etienne Labbé, Thomas Pellegrini

    Abstract: AudioSet is one of the most used and largest datasets in audio tagging, containing about 2 million audio samples that are manually labeled with 527 event categories organized into an ontology. However, the annotations contain inconsistencies, particularly where categories that should be labeled as positive according to the ontology are frequently mislabeled as negative. To address this issue, we a… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Journal ref: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2025, Hyderabad, India. pp.1-5

  2. arXiv:2309.07615  [pdf, other

    cs.SD eess.AS

    Multilingual Audio Captioning using machine translated data

    Authors: Matéo Cousin, Étienne Labbé, Thomas Pellegrini

    Abstract: Automated Audio Captioning (AAC) systems attempt to generate a natural language sentence, a caption, that describes the content of an audio recording, in terms of sound events. Existing datasets provide audio-caption pairs, with captions written in English only. In this work, we explore multilingual AAC, using machine translated captions. We translated automatically two prominent AAC datasets, Aud… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  3. arXiv:2309.00454  [pdf, other

    cs.SD eess.AS

    CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding

    Authors: Étienne Labbé, Thomas Pellegrini, Julien Pinquier

    Abstract: Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content, using encoder-decoder architectures. An audio encoder produces audio embeddings fed to a decoder, usually a Transformer decoder, for caption generation. In this work, we describe our model, which novelty, compared to existing models, lies in the use of a ConvNeXt architecture as audio encoder, adap… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  4. arXiv:2308.15090  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?

    Authors: Etienne Labbé, Thomas Pellegrini, Julien Pinquier

    Abstract: Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio recording using a textual sentence. In contrast, Audio-Text Retrieval (ATR) systems seek to find the best matching audio recording(s) for a given textual query (Text-to-Audio) or vice versa (Audio-to-Text). These tasks require different types of systems: AAC employs a sequence-to-sequence model, while ATR utili… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: cam ready version (14/08/23)

    Journal ref: DCASE2023, Sep 2023, Tampere, Finland

  5. arXiv:2306.00830  [pdf, ps, other

    cs.SD eess.AS

    Adapting a ConvNeXt model to audio classification on AudioSet

    Authors: Thomas Pellegrini, Ismail Khalfaoui-Hassani, Etienne Labbé, Timothée Masquelier

    Abstract: In computer vision, convolutional neural networks (CNN) such as ConvNeXt, have been able to surpass state-of-the-art transformers, partly thanks to depthwise separable convolutions (DSC). DSC, as an approximation of the regular convolution, has made CNNs more efficient in time and memory complexity without deteriorating their accuracy, and sometimes even improving it. In this paper, we first imple… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023

  6. arXiv:2305.01482  [pdf, other

    cs.SD eess.AS

    Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer

    Authors: Etienne Labbé, Julien Pinquier, Thomas Pellegrini

    Abstract: In this work, we propose to study the performance of a model trained with a sentence embedding regression loss component for the Automated Audio Captioning task. This task aims to build systems that can describe audio content with a single sentence written in natural language. Most systems are trained with the standard Cross-Entropy loss, which does not take into account the semantic closeness of… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  7. arXiv:2211.08983  [pdf, other

    cs.SD cs.LG eess.AS

    Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates

    Authors: Etienne Labbé, Thomas Pellegrini, Julien Pinquier

    Abstract: Automatic Audio Captioning (AAC) is the task that aims to describe an audio signal using natural language. AAC systems take as input an audio signal and output a free-form text sentence, called a caption. Evaluating such systems is not trivial, since there are many ways to express the same idea. For this reason, several complementary metrics, such as BLEU, CIDEr, SPICE and SPIDEr, are used to comp… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022), Nov 2022, Nancy, France

  8. arXiv:2102.08183  [pdf, other

    cs.SD cs.LG eess.AS

    Comparison of semi-supervised deep learning algorithms for audio classification

    Authors: Léo Cances, Etienne Labbé, Thomas Pellegrini

    Abstract: In this article, we adapted five recent SSL methods to the task of audio classification. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two collaborative neural networks. The three other algorithms, called MixMatch (MM), ReMixMatch (RMM), and FixMatch (FM), are single-model methods that rely primarily on data augmentation strategies. Using the Wide-ResNet-28-2… ▽ More

    Submitted 8 March, 2023; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 9 pages, 5 figures, 5 tables. This is the version 3 of the paper. Contains minor fixes compared to the EURASIP one (which is the version 2 of the paper)

    Journal ref: EURASIP Journal on Audio, Speech, and Music Processing, 2022