On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

Garcia-Silva, Andres; Denaux, Ronald; Gomez-Perez, Jose Manuel

doi:10.1016/j.future.2021.02.019

Computer Science > Computation and Language

arXiv:2104.06200 (cs)

[Submitted on 13 Apr 2021]

Title:On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

Authors:Andres Garcia-Silva, Ronald Denaux, Jose Manuel Gomez-Perez

View PDF

Abstract:In essence, embedding algorithms work by optimizing the distance between a word and its usual context in order to generate an embedding space that encodes the distributional representation of words. In addition to single words or word pieces, other features which result from the linguistic analysis of text, including lexical, grammatical and semantic information, can be used to improve the quality of embedding spaces. However, until now we did not have a precise understanding of the impact that such individual annotations and their possible combinations may have in the quality of the embeddings. In this paper, we conduct a comprehensive study on the use of explicit linguistic annotations to generate embeddings from a scientific corpus and quantify their impact in the resulting representations. Our results show how the effect of such annotations in the embeddings varies depending on the evaluation task. In general, we observe that learning embeddings using linguistic annotations contributes to achieve better evaluation results.

Comments:	Accepted for publication in Future Generation Computer Systems
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2104.06200 [cs.CL]
	(or arXiv:2104.06200v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2104.06200
Related DOI:	https://doi.org/10.1016/j.future.2021.02.019

Submission history

From: Jose Manuel Gomez-Perez [view email]
[v1] Tue, 13 Apr 2021 13:51:22 UTC (1,539 KB)

Computer Science > Computation and Language

Title:On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators