Skip to main content

Showing 1–5 of 5 results for author: Ortega-Martín, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.02283  [pdf

    cs.CL cs.AI

    Morphological evaluation of subwords vocabulary used by BETO language model

    Authors: Óscar García-Sierra, Ana Fernández-Pampillón Cesteros, Miguel Ortega-Martín

    Abstract: Subword tokenization algorithms used by Large Language Models are significantly more efficient and can independently build the necessary vocabulary of words and subwords without human intervention. However, those subwords do not always align with real morphemes, potentially impacting the models' performance, though it remains uncertain when this might occur. In previous research, we proposed a met… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: in Spanish language

  2. arXiv:2406.11218  [pdf

    cs.CL cs.AI

    Building another Spanish dictionary, this time with GPT-4

    Authors: Miguel Ortega-Martín, Óscar García-Sierra, Alfonso Ardoiz, Juan Carlos Armenteros, Ignacio Garrido, Jorge Álvarez, Camilo Torrón, Iñigo Galdeano, Ignacio Arranz, Oleg Vorontsov, Adrián Alonso

    Abstract: We present the "Spanish Built Factual Freectianary 2.0" (Spanish-BFF-2) as the second iteration of an AI-generated Spanish dictionary. Previously, we developed the inaugural version of this unique free dictionary employing GPT-3. In this study, we aim to improve the dictionary by using GPT-4-turbo instead. Furthermore, we explore improvements made to the initial version and compare the performance… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2403.03538  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    RADIA -- Radio Advertisement Detection with Intelligent Analytics

    Authors: Jorge Álvarez, Juan Carlos Armenteros, Camilo Torrón, Miguel Ortega-Martín, Alfonso Ardoiz, Óscar García, Ignacio Arranz, Íñigo Galdeano, Ignacio Garrido, Adrián Alonso, Fernando Bayón, Oleg Vorontsov

    Abstract: Radio advertising remains an integral part of modern marketing strategies, with its appeal and potential for targeted reach undeniably effective. However, the dynamic nature of radio airtime and the rising trend of multiple radio spots necessitates an efficient system for monitoring advertisement broadcasts. This study investigates a novel automated radio advertisement detection technique incorpor… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  4. arXiv:2302.12746  [pdf, other

    cs.CL

    Spanish Built Factual Freectianary (Spanish-BFF): the first AI-generated free dictionary

    Authors: Miguel Ortega-Martín, Óscar García-Sierra, Alfonso Ardoiz, Juan Carlos Armenteros, Jorge Álvarez, Adrián Alonso

    Abstract: Dictionaries are one of the oldest and most used linguistic resources. Building them is a complex task that, to the best of our knowledge, has yet to be explored with generative Large Language Models (LLMs). We introduce the "Spanish Built Factual Freectianary" (Spanish-BFF) as the first Spanish AI-generated dictionary. This first-of-its-kind free dictionary uses GPT-3. We also define future steps… ▽ More

    Submitted 28 February, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

  5. arXiv:2302.06426  [pdf, other

    cs.CL cs.AI

    Linguistic ambiguity analysis in ChatGPT

    Authors: Miguel Ortega-Martín, Óscar García-Sierra, Alfonso Ardoiz, Jorge Álvarez, Juan Carlos Armenteros, Adrián Alonso

    Abstract: Linguistic ambiguity is and has always been one of the main challenges in Natural Language Processing (NLP) systems. Modern Transformer architectures like BERT, T5 or more recently InstructGPT have achieved some impressive improvements in many NLP fields, but there is still plenty of work to do. Motivated by the uproar caused by ChatGPT, in this paper we provide an introduction to linguistic ambig… ▽ More

    Submitted 20 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.