The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network

Li, Xiaoran; Takano, Toshiaki

doi:10.11517/jsaislud.92.0_06

Computer Science > Computation and Language

arXiv:2208.05462 (cs)

[Submitted on 10 Aug 2022]

Title:The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network

Authors:Xiaoran Li, Toshiaki Takano

View PDF

Abstract:A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks, and we believe that by learning the smallest unit of meaning, computers can more easily understand human language. However, Existing sememe KBs are built on only manual annotation, human annotations have personal understanding biases, and the meaning of vocabulary will be constantly updated and changed with the times, and artificial methods are not always practical. To address the issue, we propose an unsupervised method based on a deep clustering network (DCN) to build a sememe KB, and you can use any language to build a KB through this method. We first learn the distributed representation of multilingual words, use MUSE to align them in a single vector space, learn the multi-layer meaning of each word through the self-attention mechanism, and use a DNC to cluster sememe features. Finally, we completed the prediction using only the 10-dimensional sememe space in English. We found that the low-dimensional space can still retain the main feature of the sememes.

Comments:	6 pages, 6 figures, published to SIG-SLUD
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2208.05462 [cs.CL]
	(or arXiv:2208.05462v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2208.05462
Journal reference:	Special Interest Group on Spoken Language Understanding and Dialogue Processing (SIG-SLUD), p. 06-, 92th (2021/9)
Related DOI:	https://doi.org/10.11517/jsaislud.92.0_06

Submission history

From: Xiaoran Li [view email]
[v1] Wed, 10 Aug 2022 17:40:45 UTC (836 KB)

Computer Science > Computation and Language

Title:The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators