Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition

Agrawal, Purvi; Ganapathy, Sriram

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2001.07067 (eess)

[Submitted on 20 Jan 2020]

Title:Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition

Authors:Purvi Agrawal, Sriram Ganapathy

View PDF

Abstract:Speech recognition from raw waveform involves learning the spectral decomposition of the signal in the first layer of the neural acoustic model using a convolution layer. In this work, we propose a raw waveform convolutional filter learning approach using soft self-attention. The acoustic filter bank in the proposed model is implemented using a parametric cosine-modulated Gaussian filter bank whose parameters are learned. A network-in-network architecture provides self-attention to generate attention weights over the sub-band filters. The attention weighted log filter bank energies are fed to the acoustic model for the task of speech recognition. Experiments are conducted on Aurora-4 (additive noise with channel artifact), and CHiME-3 (additive noise with reverberation) databases. In these experiments, the attention based filter learning approach provides considerable improvements in ASR performance over the baseline mel filter-bank features and other robust front-ends (average relative improvement of 7% in word error rate over baseline features on Aurora-4 dataset, and 5% on CHiME-3 database). Using the self-attention weights, we also present an analysis on the interpretability of the filters for the ASR task.

Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2001.07067 [eess.AS]
	(or arXiv:2001.07067v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2001.07067

Submission history

From: Purvi Agrawal [view email]
[v1] Mon, 20 Jan 2020 11:39:44 UTC (240 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators