Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

Kürzinger, Ludwig; Lindae, Nicolas; Klewitz, Palle; Rigoll, Gerhard

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2010.07597 (eess)

[Submitted on 15 Oct 2020 (v1), last revised 16 Oct 2020 (this version, v2)]

Title:Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

Authors:Ludwig Kürzinger, Nicolas Lindae, Palle Klewitz, Gerhard Rigoll

View PDF

Abstract:Many end-to-end Automatic Speech Recognition (ASR) systems still rely on pre-processed frequency-domain features that are handcrafted to emulate the human hearing. Our work is motivated by recent advances in integrated learnable feature extraction. For this, we propose Lightweight Sinc-Convolutions (LSC) that integrate Sinc-convolutions with depthwise convolutions as a low-parameter machine-learnable feature extraction for end-to-end ASR systems.
We integrated LSC into the hybrid CTC/attention architecture for evaluation. The resulting end-to-end model shows smooth convergence behaviour that is further improved by applying SpecAugment in time-domain. We also discuss filter-level improvements, such as using log-compression as activation function. Our model achieves a word error rate of 10.7% on the TEDlium v2 test dataset, surpassing the corresponding architecture with log-mel filterbank features by an absolute 1.9%, but only has 21% of its model size.

Comments:	Accepted at INTERSPEECH 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
Cite as:	arXiv:2010.07597 [eess.AS]
	(or arXiv:2010.07597v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2010.07597

Submission history

From: Ludwig Kürzinger [view email]
[v1] Thu, 15 Oct 2020 08:43:57 UTC (305 KB)
[v2] Fri, 16 Oct 2020 07:34:01 UTC (305 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators