False-Friend Detection and Entity Matching via Unsupervised Transliteration

Chen, Yanqing; Skiena, Steven

Computer Science > Computation and Language

arXiv:1611.06722 (cs)

[Submitted on 21 Nov 2016]

Title:False-Friend Detection and Entity Matching via Unsupervised Transliteration

Authors:Yanqing Chen, Steven Skiena

View PDF

Abstract:Transliterations play an important role in multilingual entity reference resolution, because proper names increasingly travel between languages in news and social media. Previous work associated with machine translation targets transliteration only single between language pairs, focuses on specific classes of entities (such as cities and celebrities) and relies on manual curation, which limits the expression power of transliteration in multilingual environment.
By contrast, we present an unsupervised transliteration model covering 69 major languages that can generate good transliterations for arbitrary strings between any language pair. Our model yields top-(1, 20, 100) averages of (32.85%, 60.44%, 83.20%) in matching gold standard transliteration compared to results from a recently-published system of (26.71%, 50.27%, 72.79%). We also show the quality of our model in detecting true and false friends from Wikipedia high frequency lexicons. Our method indicates a strong signal of pronunciation similarity and boosts the probability of finding true friends in 68 out of 69 languages.

Comments:	11 Pages, ACL style
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1611.06722 [cs.CL]
	(or arXiv:1611.06722v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1611.06722

Submission history

From: Yanqing Chen [view email]
[v1] Mon, 21 Nov 2016 11:07:11 UTC (1,321 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2016-11

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yanqing Chen
Steven Skiena

export BibTeX citation

Computer Science > Computation and Language

Title:False-Friend Detection and Entity Matching via Unsupervised Transliteration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:False-Friend Detection and Entity Matching via Unsupervised Transliteration

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators