Skip to main content

Showing 1–8 of 8 results for author: Della Libera, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2502.04465  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

    Authors: Luca Della Libera, Francesco Paissan, Cem Subakan, Mirco Ravanelli

    Abstract: Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets. Inspired by this success, researchers have explored adapting these methods to speech by discretizing continuous audio into tokens using neural audio codecs. However, existing approaches face limitations, including high bitrates, the loss of either semantic or acoustic info… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: 18 pages

  2. arXiv:2407.00463  [pdf, other

    cs.LG cs.AI cs.CL cs.HC eess.AS

    Open-Source Conversational AI with SpeechBrain 1.0

    Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Ha Nguyen , et al. (8 additional authors not shown)

    Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper prese… ▽ More

    Submitted 16 October, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted to the Journal of Machine Learning research (JMLR), Machine Learning Open Source Software

  3. arXiv:2406.14294  [pdf, other

    cs.SD cs.AI eess.AS

    DASB - Discrete Audio and Speech Benchmark

    Authors: Pooneh Mousavi, Luca Della Libera, Jarod Duret, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli

    Abstract: Discrete audio tokens have recently gained considerable attention for their potential to connect audio and language processing, enabling the creation of modern multimodal large language models. Ideal audio tokens must effectively preserve phonetic and semantic content along with paralinguistic information, speaker identity, and other details. While several types of audio tokens have been recently… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 tables

  4. arXiv:2406.10735  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    How Should We Extract Discrete Audio Tokens from Self-Supervised Models?

    Authors: Pooneh Mousavi, Jarod Duret, Salah Zaiem, Luca Della Libera, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli

    Abstract: Discrete audio tokens have recently gained attention for their potential to bridge the gap between audio and language processing. Ideal audio tokens must preserve content, paralinguistic elements, speaker identity, and many other audio details. Current audio tokenization methods fall into two categories: Semantic tokens, acquired through quantization of Self-Supervised Learning (SSL) models, and N… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 4 pages, 2 figures, 2 tables, Accepted at Interspeech 2024

  5. arXiv:2405.17615  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Listenable Maps for Zero-Shot Audio Classifiers

    Authors: Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan

    Abstract: Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthiness of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-Shot context), which, to the best of our knowledge, is the first decoder-based post-hoc interpretation method for explaining the decisions of zero-shot… ▽ More

    Submitted 21 April, 2025; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted to NeurIPS 2024

  6. arXiv:2402.02754  [pdf, other

    cs.SD cs.LG eess.AS

    Focal Modulation Networks for Interpretable Sound Classification

    Authors: Luca Della Libera, Cem Subakan, Mirco Ravanelli

    Abstract: The increasing success of deep neural networks has raised concerns about their inherent black-box nature, posing challenges related to interpretability and trust. While there has been extensive exploration of interpretation techniques in vision and language, interpretability in the audio domain has received limited attention, primarily focusing on post-hoc explanations. This paper addresses the pr… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024 XAI-SA Workshop

  7. Audio Editing with Non-Rigid Text Prompts

    Authors: Francesco Paissan, Luca Della Libera, Zhepei Wang, Mirco Ravanelli, Paris Smaragdis, Cem Subakan

    Abstract: In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-pro… ▽ More

    Submitted 24 September, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Proceedings of Interspeech. Please refer to the reference available at https://www.isca-archive.org/interspeech_2024/paissan24b_interspeech.html

  8. arXiv:2206.09507  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Resource-Efficient Separation Transformer

    Authors: Luca Della Libera, Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin

    Abstract: Transformers have recently achieved state-of-the-art performance in speech separation. These models, however, are computationally demanding and require a lot of learnable parameters. This paper explores Transformer-based speech separation with a reduced computational cost. Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-bas… ▽ More

    Submitted 15 January, 2024; v1 submitted 19 June, 2022; originally announced June 2022.

    Comments: Accepted to ICASSP 2024