A Generative Model for Raw Audio Using Transformer Architectures

Verma, Prateek; Chafe, Chris

Computer Science > Sound

arXiv:2106.16036 (cs)

[Submitted on 30 Jun 2021 (v1), last revised 8 Jul 2021 (this version, v3)]

Title:A Generative Model for Raw Audio Using Transformer Architectures

Authors:Prateek Verma, Chris Chafe

View PDF

Abstract:This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures. We propose a deep neural network for generating waveforms, similar to wavenet. This is fully probabilistic, auto-regressive, and causal, i.e. each sample generated depends only on the previously observed samples. Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next step. Using the attention mechanism, we enable the architecture to learn which audio samples are important for the prediction of the future sample. We show how causal transformer generative models can be used for raw waveform synthesis. We also show that this performance can be improved by another 2% by conditioning samples over a wider context. The flexibility of the current model to synthesize audio from latent representations suggests a large number of potential applications. The novel approach of using generative transformer architectures for raw audio synthesis is, however, still far away from generating any meaningful music, without using latent codes/meta-data to aid the generation process.

Comments:	DAFX 2021
Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2106.16036 [cs.SD]
	(or arXiv:2106.16036v3 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2106.16036

Submission history

From: Prateek Verma [view email]
[v1] Wed, 30 Jun 2021 13:05:31 UTC (1,447 KB)
[v2] Sat, 3 Jul 2021 12:41:18 UTC (1,446 KB)
[v3] Thu, 8 Jul 2021 15:28:02 UTC (1,447 KB)

Computer Science > Sound

Title:A Generative Model for Raw Audio Using Transformer Architectures

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:A Generative Model for Raw Audio Using Transformer Architectures

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators