Effective Neural Solution for Multi-Criteria Word Segmentation

He, Han; Wu, Lei; Yan, Hua; Gao, Zhimin; Feng, Yi; Townsend, George

Computer Science > Computation and Language

arXiv:1712.02856v2 (cs)

[Submitted on 7 Dec 2017 (v1), last revised 4 Jan 2018 (this version, v2)]

Title:Effective Neural Solution for Multi-Criteria Word Segmentation

Authors:Han He, Lei Wu, Hua Yan, Zhimin Gao, Yi Feng, George Townsend

View PDF

Abstract:We present a simple yet elegant solution to train a single joint model on multi-criteria corpora for Chinese Word Segmentation (CWS). Our novel design requires no private layers in model architecture, instead, introduces two artificial tokens at the beginning and ending of input sentence to specify the required target criteria. The rest of the model including Long Short-Term Memory (LSTM) layer and Conditional Random Fields (CRFs) layer remains unchanged and is shared across all datasets, keeping the size of parameter collection minimal and constant. On Bakeoff 2005 and Bakeoff 2008 datasets, our innovative design has surpassed both single-criterion and multi-criteria state-of-the-art learning results. To the best knowledge, our design is the first one that has achieved the latest high performance on such large scale datasets. Source codes and corpora of this paper are available on GitHub.

Comments:	2nd International Conference on Smart Computing & Informatics (SCI-2018), Springer Smart Innovation Systems and Technologies Book Series, Springer-Verlag, Accepted & Forthcoming, 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1712.02856 [cs.CL]
	(or arXiv:1712.02856v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1712.02856

Submission history

From: Han He [view email]
[v1] Thu, 7 Dec 2017 20:48:15 UTC (24 KB)
[v2] Thu, 4 Jan 2018 12:24:33 UTC (24 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-12

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Han He
Lei Wu
Hua Yan
Zhimin Gao
Yi Feng

…

export BibTeX citation

Computer Science > Computation and Language

Title:Effective Neural Solution for Multi-Criteria Word Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Effective Neural Solution for Multi-Criteria Word Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators