A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

Bie, Xiaoyu; Girin, Laurent; Leglaive, Simon; Hueber, Thomas; Alameda-Pineda, Xavier

Computer Science > Sound

arXiv:2106.06500 (cs)

[Submitted on 11 Jun 2021 (v1), last revised 14 Jun 2021 (this version, v2)]

Title:A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

Authors:Xiaoyu Bie, Laurent Girin, Simon Leglaive, Thomas Hueber, Xavier Alameda-Pineda

View PDF

Abstract:The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called Dynamical Variational Autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an illustration of the high potential of DVAEs for speech modeling.

Comments:	Accepted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2008.12595
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2106.06500 [cs.SD]
	(or arXiv:2106.06500v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2106.06500

Submission history

From: Xiaoyu Bie [view email]
[v1] Fri, 11 Jun 2021 16:53:20 UTC (36 KB)
[v2] Mon, 14 Jun 2021 11:03:56 UTC (36 KB)

Computer Science > Sound

Title:A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators