Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag

Perera, Ravinga; Fonseka, Thilakshi; Naranpanawa, Rashmini; Thayasivam, Uthayasanker

Computer Science > Computation and Language

arXiv:2202.08882 (cs)

[Submitted on 17 Feb 2022]

Title:Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag

Authors:Ravinga Perera, Thilakshi Fonseka, Rashmini Naranpanawa, Uthayasanker Thayasivam

View PDF

Abstract:The performance of Neural Machine Translation (NMT) depends significantly on the size of the available parallel corpus. Due to this fact, low resource language pairs demonstrate low translation performance compared to high resource language pairs. The translation quality further degrades when NMT is performed for morphologically rich languages. Even though the web contains a large amount of information, most people in Sri Lanka are unable to read and understand English properly. Therefore, there is a huge requirement of translating English content to local languages to share information among locals. Sinhala language is the primary language in Sri Lanka and building an NMT system that can produce quality English to Sinhala translations is difficult due to the syntactic divergence between these two languages under low resource constraints. Thus, in this research, we explore effective methods of incorporating Part of Speech (POS) tags to the Transformer input embedding and positional encoding to further enhance the performance of the baseline English to Sinhala neural machine translation model.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
ACM classes:	I.2.7
Cite as:	arXiv:2202.08882 [cs.CL]
	(or arXiv:2202.08882v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2202.08882

Submission history

From: Rashmini Naranpanawa [view email]
[v1] Thu, 17 Feb 2022 19:45:50 UTC (227 KB)

Computer Science > Computation and Language

Title:Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving English to Sinhala Neural Machine Translation using Part-of-Speech Tag

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators