Every Image Listens, Every Image Dances: Music-Driven Image Animation

Dong, Zhikang; Hao, Weituo; Wang, Ju-Chiang; Zhang, Peng; Polak, Pawel

Computer Science > Computer Vision and Pattern Recognition

arXiv:2501.18801 (cs)

[Submitted on 30 Jan 2025]

Title:Every Image Listens, Every Image Dances: Music-Driven Image Animation

Authors:Zhikang Dong, Weituo Hao, Ju-Chiang Wang, Peng Zhang, Pawel Polak

View PDF HTML (experimental)

Abstract:Image animation has become a promising area in multimodal research, with a focus on generating videos from reference images. While prior work has largely emphasized generic video generation guided by text, music-driven dance video generation remains underexplored. In this paper, we introduce MuseDance, an innovative end-to-end model that animates reference images using both music and text inputs. This dual input enables MuseDance to generate personalized videos that follow text descriptions and synchronize character movements with the music. Unlike existing approaches, MuseDance eliminates the need for complex motion guidance inputs, such as pose or depth sequences, making flexible and creative video generation accessible to users of all expertise levels. To advance research in this field, we present a new multimodal dataset comprising 2,904 dance videos with corresponding background music and text descriptions. Our approach leverages diffusion-based methods to achieve robust generalization, precise control, and temporal consistency, setting a new baseline for the music-driven image animation task.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.18801 [cs.CV]
	(or arXiv:2501.18801v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2501.18801

Submission history

From: Zhikang Dong [view email]
[v1] Thu, 30 Jan 2025 23:38:51 UTC (12,489 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Every Image Listens, Every Image Dances: Music-Driven Image Animation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Every Image Listens, Every Image Dances: Music-Driven Image Animation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators