A Comprehensive Analysis of Static Word Embeddings for Turkish

Sarıtaş, Karahan; Öz, Cahid Arda; Güngör, Tunga

doi:10.1016/j.eswa.2024.124123

Computer Science > Computation and Language

arXiv:2405.07778 (cs)

[Submitted on 13 May 2024]

Title:A Comprehensive Analysis of Static Word Embeddings for Turkish

Authors:Karahan Sarıtaş, Cahid Arda Öz, Tunga Güngör

View PDF HTML (experimental)

Abstract:Word embeddings are fixed-length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non-contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and non-contextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non-contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.07778 [cs.CL]
	(or arXiv:2405.07778v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.07778
Journal reference:	Expert Systems with Applications Volume 252, Part A, 15 October 2024, 124123
Related DOI:	https://doi.org/10.1016/j.eswa.2024.124123

Submission history

From: Karahan Sarıtaş [view email]
[v1] Mon, 13 May 2024 14:23:37 UTC (70 KB)

Computer Science > Computation and Language

Title:A Comprehensive Analysis of Static Word Embeddings for Turkish

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Comprehensive Analysis of Static Word Embeddings for Turkish

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators