DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

Yang, Sicheng; Wu, Zhiyong; Li, Minglei; Zhang, Zhensong; Hao, Lei; Bao, Weihong; Cheng, Ming; Xiao, Long

Computer Science > Human-Computer Interaction

arXiv:2305.04919 (cs)

[Submitted on 8 May 2023]

Title:DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

Authors:Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao

View PDF

Abstract:The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures with different initial gestures and noise. Extensive experiments show that our method outperforms recent approaches on speech-driven gesture generation. Our code, pre-trained models, and demos are available at this https URL.

Comments:	11 pages, 9 figures, IJCAI 2023
Subjects:	Human-Computer Interaction (cs.HC); Multimedia (cs.MM)
Cite as:	arXiv:2305.04919 [cs.HC]
	(or arXiv:2305.04919v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2305.04919

Submission history

From: Sicheng Yang [view email]
[v1] Mon, 8 May 2023 17:54:58 UTC (2,844 KB)

Computer Science > Human-Computer Interaction

Title:DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators