Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Kharitonov, Eugene; Rivière, Morgane; Synnaeve, Gabriel; Wolf, Lior; Mazaré, Pierre-Emmanuel; Douze, Matthijs; Dupoux, Emmanuel

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2007.00991 (eess)

[Submitted on 2 Jul 2020]

Title:Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Authors:Eugene Kharitonov, Morgane Rivière, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazaré, Matthijs Douze, Emmanuel Dupoux

View PDF

Abstract:Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2007.00991 [eess.AS]
	(or arXiv:2007.00991v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2007.00991

Submission history

From: Eugene Kharitonov [view email]
[v1] Thu, 2 Jul 2020 09:59:51 UTC (151 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators