Search | arXiv e-print repository

skLEP: A Slovak General Language Understanding Benchmark

Authors: Marek Šuppa, Andrej Ridzik, Daniel Hládek, Tomáš Javůrek, Viktória Ondrejová, Kristína Sásiková, Martin Tamajka, Marián Šimko

Abstract: In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datase… ▽ More In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU. △ Less

Submitted 26 June, 2025; originally announced June 2025.

Comments: ACL 2025 Findings

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2505.10740 [pdf, ps, other]

SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

Authors: Qiwei Peng, Robert Moro, Michal Gregor, Ivan Srba, Simon Ostermann, Marian Simko, Juraj Podroužek, Matúš Mesarčík, Jaroslav Kopčan, Anders Søgaard

Abstract: The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountere… ▽ More The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountered claims expressed in social media posts across different languages. The task includes two subtracks: (1) a monolingual track, where social posts and claims are in the same language, and (2) a crosslingual track, where social posts and claims might be in different languages. A total of 179 participants registered for the task contributing to 52 test submissions. 23 out of 31 teams have submitted their system papers. In this paper, we report the best-performing systems as well as the most common and the most effective approaches across both subtracks. This shared task, along with its dataset and participating systems, provides valuable insights into multilingual claim retrieval and automated fact-checking, supporting future research in this field. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2503.02737 [pdf, ps, other]

Large Language Models for Multilingual Previously Fact-Checked Claim Detection

Authors: Ivan Vykopal, Matúš Pikuliak, Simon Ostermann, Tatiana Anikina, Michal Gregor, Marián Šimko

Abstract: In our era of widespread false information, human fact-checkers often face the challenge of duplicating efforts when verifying claims that may have already been addressed in other countries or languages. As false information transcends linguistic boundaries, the ability to automatically detect previously fact-checked claims across languages has become an increasingly important task. This paper pre… ▽ More In our era of widespread false information, human fact-checkers often face the challenge of duplicating efforts when verifying claims that may have already been addressed in other countries or languages. As false information transcends linguistic boundaries, the ability to automatically detect previously fact-checked claims across languages has become an increasingly important task. This paper presents the first comprehensive evaluation of large language models (LLMs) for multilingual previously fact-checked claim detection. We assess seven LLMs across 20 languages in both monolingual and cross-lingual settings. Our results show that while LLMs perform well for high-resource languages, they struggle with low-resource languages. Moreover, translating original texts into English proved to be beneficial for low-resource languages. These findings highlight the potential of LLMs for multilingual previously fact-checked claim detection and provide a foundation for further research on this promising application of LLMs. △ Less

Submitted 12 June, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

arXiv:2407.02351 [pdf, other]

Generative Large Language Models in Automated Fact-Checking: A Survey

Authors: Ivan Vykopal, Matúš Pikuliak, Simon Ostermann, Marián Šimko

Abstract: The dissemination of false information on online platforms presents a serious societal challenge. While manual fact-checking remains crucial, Large Language Models (LLMs) offer promising opportunities to support fact-checkers with their vast knowledge and advanced reasoning capabilities. This survey explores the application of generative LLMs in fact-checking, highlighting various approaches and t… ▽ More The dissemination of false information on online platforms presents a serious societal challenge. While manual fact-checking remains crucial, Large Language Models (LLMs) offer promising opportunities to support fact-checkers with their vast knowledge and advanced reasoning capabilities. This survey explores the application of generative LLMs in fact-checking, highlighting various approaches and techniques for prompting or fine-tuning these models. By providing an overview of existing methods and their limitations, the survey aims to enhance the understanding of how LLMs can be used in fact-checking and to facilitate further progress in their integration into the fact-checking process. △ Less

Submitted 30 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02317 [pdf, other]

doi 10.18653/v1/2025.naacl-long.517

Soft Language Prompts for Language Transfer

Authors: Ivan Vykopal, Simon Ostermann, Marián Šimko

Abstract: Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains challenging in natural language processing (NLP). This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing cross-lingual transfer through the incorporation of language-spe… ▽ More Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains challenging in natural language processing (NLP). This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing cross-lingual transfer through the incorporation of language-specific and task-specific adapters and soft prompts. We present a detailed investigation of various combinations of these methods, exploring their efficiency across 16 languages, focusing on 10 mid- and low-resource languages. We further present to our knowledge the first use of soft prompts for language transfer, a technique we call soft language prompts. Our findings demonstrate that in contrast to claims of previous work, a combination of language and task adapters does not always work best; instead, combining a soft language prompt with a task adapter outperforms most configurations in many cases. △ Less

Submitted 30 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Journal ref: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

arXiv:2401.16575 [pdf, other]

Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking

Authors: Ivana Beňová, Jana Košecká, Michal Gregor, Martin Tamajka, Marcel Veselý, Marián Šimko

Abstract: The dominant probing approaches rely on the zero-shot performance of image-text matching tasks to gain a finer-grained understanding of the representations learned by recent multimodal image-language transformer models. The evaluation is carried out on carefully curated datasets focusing on counting, relations, attributes, and others. This work introduces an alternative probing strategy called gui… ▽ More The dominant probing approaches rely on the zero-shot performance of image-text matching tasks to gain a finer-grained understanding of the representations learned by recent multimodal image-language transformer models. The evaluation is carried out on carefully curated datasets focusing on counting, relations, attributes, and others. This work introduces an alternative probing strategy called guided masking. The proposed approach ablates different modalities using masking and assesses the model's ability to predict the masked word with high accuracy. We focus on studying multimodal models that consider regions of interest (ROI) features obtained by object detectors as input tokens. We probe the understanding of verbs using guided masking on ViLBERT, LXMERT, UNITER, and VisualBERT and show that these models can predict the correct verb with high accuracy. This contrasts with previous conclusions drawn from image-text matching probing techniques that frequently fail in situations requiring verb understanding. The code for all experiments will be publicly available https://github.com/ivana-13/guided_masking. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 9 pages of text, 11 pages total, 7 figures, 3 tables, preprint

arXiv:2311.18711 [pdf, other]

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

Authors: Matúš Pikuliak, Andrea Hrckova, Stefan Oresko, Marián Šimko

Abstract: We present GEST -- a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. GEST contains samples for 16 gender stereotypes about men and women (e.g., Women are beautiful, Men are leaders) that are compatible with the English language and 9 Slavic languages. The definition of said stereotypes was informed by gender expert… ▽ More We present GEST -- a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. GEST contains samples for 16 gender stereotypes about men and women (e.g., Women are beautiful, Men are leaders) that are compatible with the English language and 9 Slavic languages. The definition of said stereotypes was informed by gender experts. We used GEST to evaluate English and Slavic masked LMs, English generative LMs, and machine translation systems. We discovered significant and consistent amounts of gender-stereotypical reasoning in almost all the evaluated models and languages. Our experiments confirm the previously postulated hypothesis that the larger the model, the more stereotypical it usually is. △ Less

Submitted 30 September, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: EMNLP 2024 Findings

arXiv:2301.01269 [pdf, other]

Average Is Not Enough: Caveats of Multilingual Evaluation

Authors: Matúš Pikuliak, Marián Šimko

Abstract: This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language families into evaluation methodology. We argue that a qualitative analysis informed by comparative linguistics is needed for multilingual results to detect this kind of bias. We show in our case study that… ▽ More This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language families into evaluation methodology. We argue that a qualitative analysis informed by comparative linguistics is needed for multilingual results to detect this kind of bias. We show in our case study that results in published works can indeed be linguistically biased and we demonstrate that visualization based on URIEL typological database can detect it. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: The 2022 Workshop on Multilingual Representation Learning

arXiv:2109.15254 [pdf, other]

SlovakBERT: Slovak Masked Language Model

Authors: Matúš Pikuliak, Štefan Grivalský, Martin Konôpka, Miroslav Blšták, Martin Tamajka, Viktor Bachratý, Marián Šimko, Pavol Balážik, Michal Trnka, Filip Uhlárik

Abstract: We introduce a new Slovak masked language model called SlovakBERT. This is to our best knowledge the first paper discussing Slovak transformers-based language models. We evaluate our model on several NLP tasks and achieve state-of-the-art results. This evaluation is likewise the first attempt to establish a benchmark for Slovak language models. We publish the masked language model, as well as the… ▽ More We introduce a new Slovak masked language model called SlovakBERT. This is to our best knowledge the first paper discussing Slovak transformers-based language models. We evaluate our model on several NLP tasks and achieve state-of-the-art results. This evaluation is likewise the first attempt to establish a benchmark for Slovak language models. We publish the masked language model, as well as the fine-tuned models for part-of-speech tagging, sentiment analysis and semantic textual similarity. △ Less

Submitted 29 October, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

Comments: 12 pages, 2 figures

arXiv:1904.02981 [pdf, other]

NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining

Authors: Samuel Pecar, Marian Simko, Maria Bielikova

Abstract: In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: "Suggestion Mining from Online Reviews and Forums". We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attention mechanism. Our architecture tries to encode word… ▽ More In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: "Suggestion Mining from Online Reviews and Forums". We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attention mechanism. Our architecture tries to encode words via word representations using ELMo and ensembles multiple models to achieve better results. We performed experiments with different setups of our proposed model involving weighting of prediction classes for loss function. Our best model achieved in official test evaluation score of 0.6816 for subtask A and 0.6850 for subtask B. In official results, we achieved 12th and 10th place in subtasks A and B, respectively. △ Less

Submitted 5 April, 2019; originally announced April 2019.

Comments: Accepted at the SemEval-2019 International Workshop on Semantic Evaluation

arXiv:1809.06906 [pdf, other]

Improving Moderation of Online Discussions via Interpretable Neural Models

Authors: Andrej Švec, Matúš Pikuliak, Marián Šimko, Mária Bieliková

Abstract: Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we… ▽ More Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data from a major Slovak news discussion platform. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: ALW2

arXiv:1212.6734 [pdf, ps, other]

doi 10.1109/ACCESS.2013.2260371

Pushing the Limits of LTE: A Survey on Research Enhancing the Standard

Authors: Stefan Schwarz, Josep Colom Ikuno, Michal Šimko, Martin Taranetz, Qi Wang, Markus Rupp

Abstract: Cellular networks are an essential part of todays communication infrastructure. The ever-increasing demand for higher data-rates calls for a close cooperation between researchers and industry/standardization experts which hardly exists in practice. In this article we give an overview about our efforts in trying to bridge this gap. Our research group provides a standard-compliant open-source simula… ▽ More Cellular networks are an essential part of todays communication infrastructure. The ever-increasing demand for higher data-rates calls for a close cooperation between researchers and industry/standardization experts which hardly exists in practice. In this article we give an overview about our efforts in trying to bridge this gap. Our research group provides a standard-compliant open-source simulation platform for 3GPP LTE that enables reproducible research in a well-defined environment. We demonstrate that much innovative research under the confined framework of a real-world standard is still possible, sometimes even encouraged. With examplary samples of our research work we investigate on the potential of several important research areas under typical practical conditions. △ Less

Submitted 11 June, 2013; v1 submitted 30 December, 2012; originally announced December 2012.

Comments: The final version of the manuscript is available at: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6514821&isnumber=6336544

Journal ref: Schwarz, S.; Ikuno, J.C.; Simko, M.; Taranetz, M.; Wang, Q.; Rupp, M., "Pushing the Limits of LTE: A Survey on Research Enhancing the Standard," Access, IEEE , vol.1, no., pp.51,62, 2013

Showing 1–12 of 12 results for author: Šimko, M