Skip to main content

Showing 1–6 of 6 results for author: Roman, R S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.01757  [pdf, other

    cs.SD eess.AS

    MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling

    Authors: Simon Rouard, Robin San Roman, Yossi Adi, Axel Roebel

    Abstract: While most music generation models generate a mixture of stems (in mono or stereo), we propose to train a multi-stem generative model with 3 stems (bass, drums and other) that learn the musical dependencies between them. To do so, we train one specialized compression algorithm per stem to tokenize the music into parallel streams of tokens. Then, we leverage recent improvements in the task of music… ▽ More

    Submitted 7 January, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

    Comments: 5 pages, 3 figures, accepted to ICASSP 2025

  2. arXiv:2409.02915  [pdf, other

    cs.SD eess.AS

    Latent Watermarking of Audio Generative Models

    Authors: Robin San Roman, Pierre Fernandez, Antoine Deleforge, Yossi Adi, Romain Serizel

    Abstract: The advancements in audio generative models have opened up new challenges in their responsible disclosure and the detection of their misuse. In response, we introduce a method to watermark latent generative models by a specific watermarking of their training data. The resulting watermarked models produce latent representations whose decoded outputs are detected with high confidence, regardless of… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  4. arXiv:2308.02560  [pdf, other

    cs.SD cs.LG eess.AS

    From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

    Authors: Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez

    Abstract: Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations. Although such methods produce impressive results, they are prone to generate audible artifacts when the condi… ▽ More

    Submitted 8 November, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 10 pages

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems (2023)

  5. arXiv:2110.05948  [pdf, other

    eess.SP cs.AI cs.CV cs.GR cs.LG cs.SD eess.AS eess.IV

    Denoising Diffusion Gamma Models

    Authors: Eliya Nachmani, Robin San Roman, Lior Wolf

    Abstract: Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underlying noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom could improve the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2106.07582

  6. arXiv:2106.07582  [pdf, other

    cs.LG cs.CV cs.SD eess.AS

    Non Gaussian Denoising Diffusion Models

    Authors: Eliya Nachmani, Robin San Roman, Lior Wolf

    Abstract: Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underline noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom, could help the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion pro… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.