Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

Nguyen, Duc-Vu; Van Nguyen, Kiet; Nguyen, Ngan Luu-Thuy

Computer Science > Computation and Language

arXiv:2102.12136 (cs)

This paper has been withdrawn by Duc-Vu Nguyen

[Submitted on 24 Feb 2021 (v1), last revised 16 Jun 2021 (this version, v2)]

Title:Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

Authors:Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

No PDF available, click to view other formats

Abstract:Word segmentation and part-of-speech tagging are two critical preliminary steps for downstream tasks in Vietnamese natural language processing. In reality, people tend to consider also the phrase boundary when performing word segmentation and part of speech tagging rather than solely process word by word from left to right. In this paper, we implement this idea to improve word segmentation and part of speech tagging the Vietnamese language by employing a simplified constituency parser. Our neural model for joint word segmentation and part-of-speech tagging has the architecture of the syllable-based CRF constituency parser. To reduce the complexity of parsing, we replace all constituent labels with a single label indicating for phrases. This model can be augmented with predicted word boundary and part-of-speech tags by other tools. Because Vietnamese and Chinese have some similar linguistic phenomena, we evaluated the proposed model and its augmented versions on three Vietnamese benchmark datasets and six Chinese benchmark datasets. Our experimental results show that the proposed model achieves higher performances than previous works for both languages.

Comments:	The comparison with existing methods in this paper is unfair because the hyper-parameters of Bi-LSTM are different compared with previous research. Importantly, there is a data leakage issue w.r.t this paper's experimental setup
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2102.12136 [cs.CL]
	(or arXiv:2102.12136v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2102.12136

Submission history

From: Duc-Vu Nguyen [view email]
[v1] Wed, 24 Feb 2021 08:57:02 UTC (888 KB)
[v2] Wed, 16 Jun 2021 08:10:20 UTC (1 KB) (withdrawn)

Computer Science > Computation and Language

Title:Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators