Bidirectional Autoregressive Diffusion Model for Dance Generation

Zhang, Canyu; Tang, Youbao; Zhang, Ning; Lin, Ruei-Sung; Han, Mei; Xiao, Jing; Wang, Song

Computer Science > Sound

arXiv:2402.04356 (cs)

[Submitted on 6 Feb 2024 (v1), last revised 22 Jun 2024 (this version, v4)]

Title:Bidirectional Autoregressive Diffusion Model for Dance Generation

Authors:Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang

View PDF HTML (experimental)

Abstract:Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create entire motion sequences directly and unidirectionally, lacking focus on the motion with local and bidirectional enhancement. When choreographing high-quality dance movements, people need to take into account not only the musical context but also the nearby music-aligned dance motions. To authentically capture human behavior, we propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions. To make the generated dance motion smoother, a local information decoder is built for local motion enhancement. The proposed framework is able to generate new motions based on the input conditions and nearby motions, which foresees individual motion slices iteratively and consolidates all predictions. To further refine the synchronicity between the generated dance and the beat, the beat information is incorporated as an input to generate better music-aligned dance movements. Experimental results demonstrate that the proposed model achieves state-of-the-art performance compared to existing unidirectional approaches on the prominent benchmark for music-to-dance generation.

Subjects:	Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2402.04356 [cs.SD]
	(or arXiv:2402.04356v4 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2402.04356

Submission history

From: Canyu Zhang [view email]
[v1] Tue, 6 Feb 2024 19:42:18 UTC (14,844 KB)
[v2] Wed, 5 Jun 2024 03:57:03 UTC (23,648 KB)
[v3] Thu, 6 Jun 2024 08:56:56 UTC (23,650 KB)
[v4] Sat, 22 Jun 2024 14:19:45 UTC (22,761 KB)

Computer Science > Sound

Title:Bidirectional Autoregressive Diffusion Model for Dance Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Bidirectional Autoregressive Diffusion Model for Dance Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators