Skip to main content

Showing 1–13 of 13 results for author: Franco-Salvador, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.16921  [pdf, other

    cs.CL

    IberBench: LLM Evaluation on Iberian Languages

    Authors: José Ángel González, Ian Borrego Obrador, Álvaro Romo Herrero, Areg Mikael Sarvazyan, Mara Chinea-Ríos, Angelo Basile, Marc Franco-Salvador

    Abstract: Large Language Models (LLMs) remain difficult to evaluate comprehensively, particularly for languages other than English, where high-quality data is often limited. Existing benchmarks and leaderboards are predominantly English-centric, with only a few addressing other languages. These benchmarks fall short in several key areas: they overlook the diversity of language varieties, prioritize fundamen… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  2. arXiv:2401.03946  [pdf, other

    cs.CL

    TextMachina: Seamless Generation of Machine-Generated Text Datasets

    Authors: Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador

    Abstract: Recent advancements in Large Language Models (LLMs) have led to high-quality Machine-Generated Text (MGT), giving rise to countless new use cases and applications. However, easy access to LLMs is posing new challenges due to misuse. To address malicious usage, researchers have released datasets to effectively train models on MGT-related tasks. Similar strategies are used to compile these datasets,… ▽ More

    Submitted 12 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: 14 pages, 10 figures

  3. arXiv:2309.11285  [pdf, other

    cs.CL cs.AI cs.LG

    Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains

    Authors: Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador, Francisco Rangel, Berta Chulvi, Paolo Rosso

    Abstract: This paper presents the overview of the AuTexTification shared task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN 2023 conference. AuTexTification consists of two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by a large language model. For Subtask 2, participants had to a… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted at SEPLN 2023

    Journal ref: Procesamiento del Lenguaje Natural, [S.l.], v. 71, p. 275-288, sep. 2023

  4. arXiv:2211.11554  [pdf, other

    cs.CL

    Programming by Example and Text-to-Code Translation for Conversational Code Generation

    Authors: Eli Whitehouse, William Gerard, Yauhen Klimovich, Marc Franco-Salvador

    Abstract: Dialogue systems is an increasingly popular task of natural language processing. However, the dialogue paths tend to be deterministic, restricted to the system rails, regardless of the given request or input text. Recent advances in program synthesis have led to systems which can synthesize programs from very general search spaces, e.g. Programming by Example, and to systems with very accessible i… ▽ More

    Submitted 18 January, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 13 pages, 2 figures, conference preprint

  5. arXiv:2204.10543  [pdf, other

    cs.CL

    Zero and Few-shot Learning for Author Profiling

    Authors: Mara Chinea-Rios, Thomas Müller, Gretel Liz De la Peña Sarracén, Francisco Rangel, Marc Franco-Salvador

    Abstract: Author profiling classifies author characteristics by analyzing how language is shared among people. In this work, we study that task from a low-resource viewpoint: using little or no training data. We explore different zero and few-shot models based on entailment and evaluate our systems on several profiling tasks in Spanish and English. In addition, we study the effect of both the entailment hyp… ▽ More

    Submitted 17 May, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

  6. arXiv:2204.09481  [pdf, other

    cs.CL cs.LG

    Unsupervised Ranking and Aggregation of Label Descriptions for Zero-Shot Classifiers

    Authors: Angelo Basile, Marc Franco-Salvador, Paolo Rosso

    Abstract: Zero-shot text classifiers based on label descriptions embed an input text and a set of labels into the same space: measures such as cosine similarity can then be used to select the most similar label description to the input text as the predicted label. In a true zero-shot setup, designing good label descriptions is challenging because no development set is available. Inspired by the literature o… ▽ More

    Submitted 24 May, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: 6 pages, 2 figures

    MSC Class: I.2.7

  7. arXiv:2204.09347  [pdf, other

    cs.CL

    Active Few-Shot Learning with FASL

    Authors: Thomas Müller, Guillermo Pérez-Torró, Angelo Basile, Marc Franco-Salvador

    Abstract: Recent advances in natural language processing (NLP) have led to strong text classification models for many tasks. However, still often thousands of examples are needed to train models with good quality. This makes it challenging to quickly develop and deploy new models for real world problems and business needs. Few-shot learning and active learning are two lines of research, aimed at tackling th… ▽ More

    Submitted 17 May, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

  8. arXiv:2203.14655  [pdf, other

    cs.CL cs.LG

    Few-Shot Learning with Siamese Networks and Label Tuning

    Authors: Thomas Müller, Guillermo Pérez-Torró, Marc Franco-Salvador

    Abstract: We study the problem of building text classifiers with little or no training data, commonly known as zero and few-shot text classification. In recent years, an approach based on neural textual entailment models has been found to give strong results on a diverse range of tasks. In this work, we show that with proper pre-training, Siamese Networks that embed texts and labels offer a competitive alte… ▽ More

    Submitted 20 April, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  9. arXiv:2012.09692  [pdf, other

    cs.CL

    Five Psycholinguistic Characteristics for Better Interaction with Users

    Authors: Sanja Štajner, Seren Yenikent, Marc Franco-Salvador

    Abstract: When two people pay attention to each other and are interested in what the other has to say or write, they almost instantly adapt their writing/speaking style to match the other. For a successful interaction with a user, chatbots and dialogue systems should be able to do the same. We propose a framework consisting of five psycholinguistic textual characteristics for better human-computer interacti… ▽ More

    Submitted 21 March, 2022; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: 26 pages, 4 figures

  10. arXiv:1807.11584  [pdf, ps, other

    cs.CL

    UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering

    Authors: Marc Franco-Salvador, Sudipta Kar, Thamar Solorio, Paolo Rosso

    Abstract: In this work we describe the system built for the three English subtasks of the SemEval 2016 Task 3 by the Department of Computer Science of the University of Houston (UH) and the Pattern Recognition and Human Language Technology (PRHLT) research center - Universitat Polit`ecnica de Val`encia: UH-PRHLT. Our system represents instances by using both lexical and semantic-based similarity measures be… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

    Comments: Top system for question-question similarity in SemEval 2016 Task 3

  11. Semantically-informed distance and similarity measures for paraphrase plagiarism identification

    Authors: Miguel A. Álvarez-Carmona, Marc Franco-Salvador, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Paolo Rosso, Luis Villaseñor-Pineda

    Abstract: Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance. Both measures are able to extract semantic inform… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

    Journal ref: Journal of Intelligent & Fuzzy Systems, vol. 34, no. 5, pp. 2983-2990, 2018

  12. arXiv:1801.06436  [pdf, other

    cs.CL

    A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

    Authors: Goran Glavaš, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo Rosso

    Abstract: Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named ent… ▽ More

    Submitted 19 January, 2018; originally announced January 2018.

    Comments: Accepted for publication in Knowledge-Based Systems journal

  13. arXiv:1705.10754  [pdf, other

    cs.CL

    A Low Dimensionality Representation for Language Variety Identification

    Authors: Francisco Rangel, Marc Franco-Salvador, Paolo Rosso

    Abstract: Language variety identification aims at labelling texts in a native language (e.g. Spanish, Portuguese, English) with its specific variation (e.g. Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work we propose a low dimensionality representation (LDR) to address this task with five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain. We compare our L… ▽ More

    Submitted 30 May, 2017; originally announced May 2017.

    Journal ref: CICLing - Computational Linguistics and Intelligent Text Processing, 2016