SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

Comunità, Marco; Gramaccioni, Riccardo F.; Postolache, Emilian; Rodolà, Emanuele; Comminiello, Danilo; Reiss, Joshua D.

Computer Science > Sound

arXiv:2310.15247 (cs)

[Submitted on 23 Oct 2023]

Title:SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

Authors:Marco Comunità, Riccardo F. Gramaccioni, Emilian Postolache, Emanuele Rodolà, Danilo Comminiello, Joshua D. Reiss

View PDF

Abstract:Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no reference audio exists, requiring manual annotation of event timings from the video. We propose a system to extract repetitive actions onsets from a video, which are then used - in conjunction with audio or textual embeddings - to condition a diffusion model trained to generate a new synchronized sound effects audio track. In this way, we leave complete creative control to the sound designer while removing the burden of synchronization with video. Furthermore, editing the onset track or changing the conditioning embedding requires much less effort than editing the audio track itself, simplifying the sonification process. We provide sound examples, source code, and pretrained models to faciliate reproducibility

Subjects:	Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2310.15247 [cs.SD]
	(or arXiv:2310.15247v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2310.15247

Submission history

From: Marco Comunità [view email]
[v1] Mon, 23 Oct 2023 18:01:36 UTC (547 KB)

Computer Science > Sound

Title:SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators