Skip to main content

Showing 1–14 of 14 results for author: Comanducci, L

.
  1. arXiv:2505.07615  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models

    Authors: Riccardo Passoni, Francesca Ronchini, Luca Comanducci, Romain Serizel, Fabio Antonacci

    Abstract: Text-to-audio models have recently emerged as a powerful technology for generating sound from textual descriptions. However, their high computational demands raise concerns about energy consumption and environmental impact. In this paper, we conduct an analysis of the energy usage of 7 state-of-the-art text-to-audio diffusion-based generative models, evaluating to what extent variations in generat… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  2. arXiv:2501.02871  [pdf, other

    cs.SD eess.AS

    Towards HRTF Personalization using Denoising Diffusion Models

    Authors: Juan Camilo Albarracín Sánchez, Luca Comanducci, Mirco Pezzoli, Fabio Antonacci

    Abstract: Head-Related Transfer Functions (HRTFs) have fundamental applications for realistic rendering in immersive audio scenarios. However, they are strongly subject-dependent as they vary considerably depending on the shape of the ears, head and torso. Thus, personalization procedures are required for accurate binaural rendering. Recently, Denoising Diffusion Probabilistic Models (DDPMs), a class of gen… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: to appear in ICASSP 2025

  3. arXiv:2409.10684  [pdf, other

    eess.AS cs.SD

    FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models

    Authors: Luca Comanducci, Paolo Bestagini, Stefano Tubaro

    Abstract: Text-To-Music (TTM) models have recently revolutionized the automatic music generation research field. Specifically, by reaching superior performances to all previous state-of-the-art models and by lowering the technical proficiency needed to use them. Due to these reasons, they have readily started to be adopted for commercial uses and music production practices. This widespread diffusion of TTMs… ▽ More

    Submitted 25 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

  4. MambaFoley: Foley Sound Generation using Selective State-Space Models

    Authors: Marco Furio Colombo, Francesca Ronchini, Luca Comanducci, Fabio Antonacci

    Abstract: Recent advancements in deep learning have led to widespread use of techniques for audio content generation, notably employing Denoising Diffusion Probabilistic Models (DDPM) across various tasks. Among these, Foley Sound Synthesis is of particular interest for its role in applications for the creation of multimedia content. Given the temporal-dependent nature of sound, it is crucial to design gene… ▽ More

    Submitted 13 March, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted at ICASSP 2025

  5. arXiv:2407.04333  [pdf, other

    cs.SD eess.AS

    PAGURI: a user experience study of creative interaction with text-to-music models

    Authors: Francesca Ronchini, Luca Comanducci, Gabriele Perego, Fabio Antonacci

    Abstract: In recent years, text-to-music models have been the biggest breakthrough in automatic music generation. While they are unquestionably a showcase of technological progress, it is not clear yet how they can be realistically integrated into the artistic practice of musicians and music practitioners. This paper aims to address this question via Prompt Audio Generation User Research Investigation (PAGU… ▽ More

    Submitted 5 September, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  6. arXiv:2404.03436  [pdf, other

    eess.AS cs.SD

    Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation

    Authors: Luca Comanducci, Fabio Antonacci, Augusto Sarti

    Abstract: Deep learning models are widely applied in the signal processing community, yet their inner working procedure is often treated as a black box. In this paper, we investigate the use of eXplainable Artificial Intelligence (XAI) techniques to learning-based end-to-end speech source localization models. We consider the Layer-wise Relevance Propagation (LRP) technique, which aims to determine which par… ▽ More

    Submitted 26 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  7. arXiv:2403.17864  [pdf, other

    eess.AS cs.SD eess.SP

    Synthetic training set generation using text-to-audio models for environmental sound classification

    Authors: Francesca Ronchini, Luca Comanducci, Fabio Antonacci

    Abstract: In recent years, text-to-audio models have revolutionized the field of automatic audio generation. This paper investigates their application in generating synthetic datasets for training data-driven models. Specifically, this study analyzes the performance of two environmental sound classification systems trained with data generated from text-to-audio models. We considered three scenarios: a) augm… ▽ More

    Submitted 6 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  8. arXiv:2402.13896  [pdf, other

    eess.AS eess.SP

    HOMULA-RIR: A Room Impulse Response Dataset for Teleconferencing and Spatial Audio Applications Acquired Through Higher-Order Microphones and Uniform Linear Microphone Arrays

    Authors: Federico Miotello, Paolo Ostan, Mirco Pezzoli, Luca Comanducci, Alberto Bernardini, Fabio Antonacci, Augusto Sarti

    Abstract: In this paper, we present HOMULA-RIR, a dataset of room impulse responses (RIRs) acquired using both higher-order microphones (HOMs) and a uniform linear array (ULA), in order to model a remote attendance teleconferencing scenario. Specifically, measurements were performed in a seminar room, where a 64-microphone ULA was used as a multichannel audio acquisition system in the proximity of the speak… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted for publication at ICASSP 2024 - HSCMA Workshop

  9. arXiv:2402.04866  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Room Transfer Function Reconstruction Using Complex-valued Neural Networks and Irregularly Distributed Microphones

    Authors: Francesca Ronchini, Luca Comanducci, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti

    Abstract: Reconstructing the room transfer functions needed to calculate the complex sound field in a room has several important real-world applications. However, an unpractical number of microphones is often required. Recently, in addition to classical signal processing methods, deep learning techniques have been applied to reconstruct the room transfer function starting from a very limited set of measurem… ▽ More

    Submitted 11 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted at EUSIPCO 2024

  10. arXiv:2312.08821  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Reconstruction of Sound Field through Diffusion Models

    Authors: Federico Miotello, Luca Comanducci, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti

    Abstract: Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probab… ▽ More

    Submitted 21 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted for publication at ICASSP 2024

  11. arXiv:2307.04586  [pdf, other

    eess.AS

    Timbre transfer using image-to-image denoising diffusion implicit models

    Authors: Luca Comanducci, Fabio Antonacci, Augusto Sarti

    Abstract: Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical characteristics such as melody and dynamics. Following their recent breakthroughs in deep learning-based generation, we apply Denoising Diffusion Models (DDMs) to perf… ▽ More

    Submitted 28 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

  12. Synthesis of Soundfields through Irregular Loudspeaker Arrays Based on Convolutional Neural Networks

    Authors: Luca Comanducci, Fabio Antonacci, Augusto Sarti

    Abstract: Most soundfield synthesis approaches deal with extensive and regular loudspeaker arrays, which are often not suitable for home audio systems, due to physical space constraints. In this article we propose a technique for soundfield synthesis through more easily deployable irregular loudspeaker arrays, i.e. where the spacing between loudspeakers is not constant, based on deep learning. The input are… ▽ More

    Submitted 11 September, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

  13. arXiv:2002.00641  [pdf, ps, other

    eess.AS cs.SD

    Time Difference of Arrival Estimation from Frequency-Sliding Generalized Cross-Correlations Using Convolutional Neural Networks

    Authors: Luca Comanducci, Maximo Cobos, Fabio Antonacci, Augusto Sarti

    Abstract: The interest in deep learning methods for solving traditional signal processing tasks has been steadily growing in the last years. Time delay estimation (TDE) in adverse scenarios is a challenging problem, where classical approaches based on generalized cross-correlations (GCCs) have been widely used for decades. Recently, the frequency-sliding GCC (FS-GCC) was proposed as a novel technique for TD… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

    Comments: Paper accepted for presentation in ICASSP 2020

    MSC Class: 94A12; 68T10 ACM Class: I.2.0; I.5.4

  14. arXiv:1910.08838  [pdf, other

    eess.AS

    Frequency-Sliding Generalized Cross-Correlation: A Sub-band Time Delay Estimation Approach

    Authors: Maximo Cobos, Fabio Antonacci, Luca Comanducci, Augusto Sarti

    Abstract: The generalized cross correlation (GCC) is regarded as the most popular approach for estimating the time difference of arrival (TDOA) between the signals received at two sensors. Time delay estimates are obtained by maximizing the GCC output, where the direct-path delay is usually observed as a prominent peak. Moreover, GCCs play also an important role in steered response power (SRP) localization… ▽ More

    Submitted 24 March, 2020; v1 submitted 19 October, 2019; originally announced October 2019.

    Comments: Article accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processing