-
ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis
Abstract: The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained large language model f… ▽ More
Submitted 3 July, 2023; originally announced July 2023.
Comments: Accepted for publication at SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing
-
arXiv:2306.01325 [pdf, ps, other]
LyricSIM: A novel Dataset and Benchmark for Similarity Detection in Spanish Song LyricS
Abstract: In this paper, we present a new dataset and benchmark tailored to the task of semantic similarity in song lyrics. Our dataset, originally consisting of 2775 pairs of Spanish songs, was annotated in a collective annotation experiment by 63 native annotators. After collecting and refining the data to ensure a high degree of consensus and data integrity, we obtained 676 high-quality annotated pairs t… ▽ More
Submitted 2 June, 2023; originally announced June 2023.
Comments: Accepted to Congreso Internacional de la Sociedad Española para el Procesamiento del Lenguaje Natural 2023 (SEPLN2023)
-
arXiv:2011.09567 [pdf, ps, other]
Predicting metrical patterns in Spanish poetry with language models
Abstract: In this paper, we compare automated metrical pattern identification systems available for Spanish against extensive experiments done by fine-tuning language models trained on the same task. Despite being initially conceived as a model suitable for semantic tasks, our results suggest that BERT-based models retain enough structural information to perform reasonably well for Spanish scansion.
Submitted 18 November, 2020; originally announced November 2020.
Comments: LXAI Workshop @ NeurIPS 2020