Pre-training with Synthetic Patterns for Audio

Ishikawa, Yuchi; Komatsu, Tatsuya; Aoki, Yoshimitsu

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2410.00511 (eess)

[Submitted on 1 Oct 2024]

Title:Pre-training with Synthetic Patterns for Audio

Authors:Yuchi Ishikawa, Tatsuya Komatsu, Yoshimitsu Aoki

View PDF HTML (experimental)

Abstract:In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is Masked Autoencoder (MAE), a self-supervised learning framework that learns from reconstructing data from randomly masked counterparts. MAEs tend to focus on low-level information such as visual patterns and regularities within data. Therefore, it is unimportant what is portrayed in the input, whether it be images, audio mel-spectrograms, or even synthetic patterns. This leads to the second key element, which is synthetic data. Synthetic data, unlike real audio, is free from privacy and licensing infringement issues. By combining MAEs and synthetic patterns, our framework enables the model to learn generalized feature representations without real data, while addressing the issues related to real audio. To evaluate the efficacy of our framework, we conduct extensive experiments across a total of 13 audio tasks and 17 synthetic datasets. The experiments provide insights into which types of synthetic patterns are effective for audio. Our results demonstrate that our framework achieves performance comparable to models pre-trained on AudioSet-2M and partially outperforms image-based pre-training methods.

Comments:	Submitted to ICASSP'25
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2410.00511 [eess.AS]
	(or arXiv:2410.00511v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2410.00511

Submission history

From: Yuchi Ishikawa [view email]
[v1] Tue, 1 Oct 2024 08:52:35 UTC (2,049 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Pre-training with Synthetic Patterns for Audio

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Pre-training with Synthetic Patterns for Audio

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators