STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Naeem, Saad; Beg, Omer

Computer Science > Computation and Language

arXiv:2204.07848 (cs)

[Submitted on 16 Apr 2022]

Title:STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Authors:Saad Naeem, Omer Beg

View PDF

Abstract:Phoneme recognition is a largely unsolved problem in NLP, especially for low-resource languages like Urdu. The systems that try to extract the phonemes from audio speech require hand-labeled phonetic transcriptions. This requires expert linguists to annotate speech data with its relevant phonetic representation which is both an expensive and a tedious task. In this paper, we propose STRATA, a framework for supervised phoneme recognition that overcomes the data scarcity issue for low resource languages using a seq2seq neural architecture integrated with transfer learning, attention mechanism, and data augmentation. STRATA employs transfer learning to reduce the network loss in half. It uses attention mechanism for word boundaries and frame alignment detection which further reduces the network loss by 4% and is able to identify the word boundaries with 92.2% accuracy. STRATA uses various data augmentation techniques to further reduce the loss by 1.5% and is more robust towards new signals both in terms of generalization and accuracy. STRATA is able to achieve a Phoneme Error Rate of 16.5% and improves upon the state of the art by 1.1% for TIMIT dataset (English) and 11.5% for CSaLT dataset (Urdu).

Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.07848 [cs.CL]
	(or arXiv:2204.07848v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.07848

Submission history

From: Saad Naeem [view email]
[v1] Sat, 16 Apr 2022 17:43:17 UTC (4,290 KB)

Computer Science > Computation and Language

Title:STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators