Skip to main content

Showing 1–3 of 3 results for author: Benita, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.06778  [pdf, other

    cs.SD eess.AS

    CAFA: a Controllable Automatic Foley Artist

    Authors: Roi Benita, Michael Finkelson, Tavi Halperin, Gleb Sterkin, Yossi Adi

    Abstract: Foley is a key element in video production, refers to the process of adding an audio signal to a silent video while ensuring semantic and temporal alignment. In recent years, the rise of personalized content creation and advancements in automatic video-to-audio models have increased the demand for greater user control in the process. One possible approach is to incorporate text to guide audio gene… ▽ More

    Submitted 17 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: Renamed paper to "CAFA: a Controllable Automatic Foley Artist" from "Controllable Automatic Foley Artist". Updated link to demo page

  2. arXiv:2502.00180  [pdf, other

    cs.LG stat.ML

    Spectral Analysis of Diffusion Models with Application to Schedule Design

    Authors: Roi Benita, Michael Elad, Joseph Keshet

    Abstract: Diffusion models (DMs) have emerged as powerful tools for modeling complex data distributions and generating realistic new samples. Over the years, advanced architectures and sampling methods have been developed to make these models practically usable. However, certain synthesis process decisions still rely on heuristics without a solid theoretical foundation. In our work, we offer a novel analysi… ▽ More

    Submitted 31 May, 2025; v1 submitted 31 January, 2025; originally announced February 2025.

  3. arXiv:2310.01381  [pdf, other

    cs.SD cs.CL eess.AS

    DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation

    Authors: Roi Benita, Michael Elad, Joseph Keshet

    Abstract: Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a vocoder). This work proposes a diffusion probabilistic end-to-end model for generating a raw speech waveform. The proposed model is autoregressive, g… ▽ More

    Submitted 10 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.