Skip to main content

Showing 1–11 of 11 results for author: Mancusi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16889  [pdf, ps, other

    cs.SD eess.AS

    ITO-Master: Inference-Time Optimization for Audio Effects Modeling of Music Mastering Processors

    Authors: Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Michele Mancusi, Yuki Mitsufuji

    Abstract: Music mastering style transfer aims to model and apply the mastering characteristics of a reference track to a target track, simulating the professional mastering process. However, existing methods apply fixed processing based on a reference track, limiting users' ability to fine-tune the results to match their artistic intent. In this paper, we introduce the ITO-Master framework, a reference-base… ▽ More

    Submitted 2 July, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

    Comments: ISMIR 2025

  2. arXiv:2504.10826  [pdf, other

    cs.SD cs.MM eess.AS

    SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing

    Authors: Xinlei Niu, Kin Wai Cheuk, Jing Zhang, Naoki Murata, Chieh-Hsin Lai, Michele Mancusi, Woosung Choi, Giorgio Fabbro, Wei-Hsiang Liao, Charles Patrick Martin, Yuki Mitsufuji

    Abstract: Music editing is an important step in music production, which has broad applications, including game development and film production. Most existing zero-shot text-guided methods rely on pretrained diffusion models by involving forward-backward diffusion processes for editing. However, these methods often struggle to maintain the music content consistency. Additionally, text instructions alone usua… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  3. arXiv:2504.05690  [pdf, other

    cs.SD eess.AS

    STAGE: Stemmed Accompaniment Generation through Prefix-Based Conditioning

    Authors: Giorgio Strano, Chiara Ballanti, Donato Crisostomi, Michele Mancusi, Luca Cosmo, Emanuele Rodolà

    Abstract: Recent advances in generative models have made it possible to create high-quality, coherent music, with some systems delivering production-level output. Yet, most existing models focus solely on generating music from scratch, limiting their usefulness for musicians who want to integrate such models into a human, iterative composition workflow. In this paper we introduce STAGE, our STemmed Accompan… ▽ More

    Submitted 9 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  4. arXiv:2409.11145  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    High-Resolution Speech Restoration with Latent Diffusion Model

    Authors: Tushar Dhyani, Florian Lux, Michele Mancusi, Giorgio Fabbro, Fritz Hohl, Ngoc Thang Vu

    Abstract: Traditional speech enhancement methods often oversimplify the task of restoration by focusing on a single type of distortion. Generative models that handle multiple distortions frequently struggle with phone reconstruction and high-frequency harmonics, leading to breathing and gasping artifacts that reduce the intelligibility of reconstructed speech. These models are also computationally demanding… ▽ More

    Submitted 10 February, 2025; v1 submitted 17 September, 2024; originally announced September 2024.

  5. arXiv:2409.06096  [pdf, ps, other

    cs.SD cs.AI cs.IR eess.AS

    Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer

    Authors: Michele Mancusi, Yurii Halychanskyi, Kin Wai Cheuk, Eloi Moliner, Chieh-Hsin Lai, Stefan Uhlich, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Yuki Mitsufuji

    Abstract: Music timbre transfer is a challenging task that involves modifying the timbral characteristics of an audio signal while preserving its melodic structure. In this paper, we propose a novel method based on dual diffusion bridges, trained using the CocoChorales Dataset, which consists of unpaired monophonic single-instrument audio data. Each diffusion model is trained on a specific instrument with a… ▽ More

    Submitted 7 January, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

  6. arXiv:2404.16969  [pdf, other

    cs.SD cs.LG eess.AS

    COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

    Authors: Ruben Ciranni, Giorgio Mariani, Michele Mancusi, Emilian Postolache, Giorgio Fabbro, Emanuele Rodolà, Luca Cosmo

    Abstract: We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples. Our method operates at the level of the stems composing music tracks and can input features obtained via Harmonic-Percussive Separation (HPS). COCOLA allows the objective evaluation of generative mo… ▽ More

    Submitted 9 January, 2025; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Demo page: https://github.com/gladia-research-group/cocola, Accepted at ICASSP-25

  7. Accelerating Transformer Inference for Translation via Parallel Decoding

    Authors: Andrea Santilli, Silvio Severino, Emilian Postolache, Valentino Maiorca, Michele Mancusi, Riccardo Marin, Emanuele Rodolà

    Abstract: Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network architectures and learning-based methods to solve this issue, which are expensive and require changes to the MT model, trading inference speed at the cost of the translation quality. In this paper, we propose to address the problem from the point of view of decoding a… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023 main conference

  8. arXiv:2302.02257  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

    Authors: Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodolà

    Abstract: In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e., generating a mixture, separating the sources), we also introduce and experiment on the partial generation task of source imputation, where we generate… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 February, 2023; originally announced February 2023.

    Comments: ICLR 2024 oral presentation. Demo page: https://gladia-research-group.github.io/multi-source-diffusion-models/

  9. arXiv:2301.08562  [pdf, other

    cs.LG cs.SD eess.AS

    Latent Autoregressive Source Separation

    Authors: Emilian Postolache, Giorgio Mariani, Michele Mancusi, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

    Abstract: Autoregressive models have achieved impressive results over a wide range of domains in terms of generation quality and downstream task performance. In the continuous domain, a key factor behind this success is the usage of quantized latent spaces (e.g., obtained via VQ-VAE autoencoders), which allow for dimensionality reduction and faster inference times. However, using existing pre-trained models… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Accepted to AAAI 2023

  10. arXiv:2201.05013  [pdf, other

    cs.SD cs.LG eess.AS

    Fish sounds: towards the evaluation of marine acoustic biodiversity through data-driven audio source separation

    Authors: Michele Mancusi, Nicola Zonca, Emanuele Rodolà, Silvia Zuffi

    Abstract: The marine ecosystem is changing at an alarming rate, exhibiting biodiversity loss and the migration of tropical species to temperate basins. Monitoring the underwater environments and their inhabitants is of fundamental importance to understand the evolution of these systems and implement safeguard policies. However, assessing and tracking biodiversity is often a complex task, especially in large… ▽ More

    Submitted 14 January, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

  11. arXiv:2110.05313  [pdf, other

    cs.LG cs.SD eess.AS

    Unsupervised Source Separation via Bayesian Inference in the Latent Domain

    Authors: Michele Mancusi, Emilian Postolache, Giorgio Mariani, Marco Fumero, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

    Abstract: State of the art audio source separation models rely on supervised data-driven approaches, which can be expensive in terms of labeling resources. On the other hand, approaches for training these models without any direct supervision are typically high-demanding in terms of memory and time requirements, and remain impractical to be used at inference time. We aim to tackle these limitations by propo… ▽ More

    Submitted 30 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: 5 pages, 2 figures, submitted to Interspeech 2022