-
skLEP: A Slovak General Language Understanding Benchmark
Authors:
Marek Šuppa,
Andrej Ridzik,
Daniel Hládek,
Tomáš Javůrek,
Viktória Ondrejová,
Kristína Sásiková,
Martin Tamajka,
Marián Šimko
Abstract:
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datase…
▽ More
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval
Authors:
Qiwei Peng,
Robert Moro,
Michal Gregor,
Ivan Srba,
Simon Ostermann,
Marian Simko,
Juraj Podroužek,
Matúš Mesarčík,
Jaroslav Kopčan,
Anders Søgaard
Abstract:
The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountere…
▽ More
The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountered claims expressed in social media posts across different languages. The task includes two subtracks: (1) a monolingual track, where social posts and claims are in the same language, and (2) a crosslingual track, where social posts and claims might be in different languages. A total of 179 participants registered for the task contributing to 52 test submissions. 23 out of 31 teams have submitted their system papers. In this paper, we report the best-performing systems as well as the most common and the most effective approaches across both subtracks. This shared task, along with its dataset and participating systems, provides valuable insights into multilingual claim retrieval and automated fact-checking, supporting future research in this field.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Large Language Models for Multilingual Previously Fact-Checked Claim Detection
Authors:
Ivan Vykopal,
Matúš Pikuliak,
Simon Ostermann,
Tatiana Anikina,
Michal Gregor,
Marián Šimko
Abstract:
In our era of widespread false information, human fact-checkers often face the challenge of duplicating efforts when verifying claims that may have already been addressed in other countries or languages. As false information transcends linguistic boundaries, the ability to automatically detect previously fact-checked claims across languages has become an increasingly important task. This paper pre…
▽ More
In our era of widespread false information, human fact-checkers often face the challenge of duplicating efforts when verifying claims that may have already been addressed in other countries or languages. As false information transcends linguistic boundaries, the ability to automatically detect previously fact-checked claims across languages has become an increasingly important task. This paper presents the first comprehensive evaluation of large language models (LLMs) for multilingual previously fact-checked claim detection. We assess seven LLMs across 20 languages in both monolingual and cross-lingual settings. Our results show that while LLMs perform well for high-resource languages, they struggle with low-resource languages. Moreover, translating original texts into English proved to be beneficial for low-resource languages. These findings highlight the potential of LLMs for multilingual previously fact-checked claim detection and provide a foundation for further research on this promising application of LLMs.
△ Less
Submitted 12 June, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Generative Large Language Models in Automated Fact-Checking: A Survey
Authors:
Ivan Vykopal,
Matúš Pikuliak,
Simon Ostermann,
Marián Šimko
Abstract:
The dissemination of false information on online platforms presents a serious societal challenge. While manual fact-checking remains crucial, Large Language Models (LLMs) offer promising opportunities to support fact-checkers with their vast knowledge and advanced reasoning capabilities. This survey explores the application of generative LLMs in fact-checking, highlighting various approaches and t…
▽ More
The dissemination of false information on online platforms presents a serious societal challenge. While manual fact-checking remains crucial, Large Language Models (LLMs) offer promising opportunities to support fact-checkers with their vast knowledge and advanced reasoning capabilities. This survey explores the application of generative LLMs in fact-checking, highlighting various approaches and techniques for prompting or fine-tuning these models. By providing an overview of existing methods and their limitations, the survey aims to enhance the understanding of how LLMs can be used in fact-checking and to facilitate further progress in their integration into the fact-checking process.
△ Less
Submitted 30 October, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Soft Language Prompts for Language Transfer
Authors:
Ivan Vykopal,
Simon Ostermann,
Marián Šimko
Abstract:
Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains challenging in natural language processing (NLP). This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing cross-lingual transfer through the incorporation of language-spe…
▽ More
Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains challenging in natural language processing (NLP). This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing cross-lingual transfer through the incorporation of language-specific and task-specific adapters and soft prompts. We present a detailed investigation of various combinations of these methods, exploring their efficiency across 16 languages, focusing on 10 mid- and low-resource languages. We further present to our knowledge the first use of soft prompts for language transfer, a technique we call soft language prompts. Our findings demonstrate that in contrast to claims of previous work, a combination of language and task adapters does not always work best; instead, combining a soft language prompt with a task adapter outperforms most configurations in many cases.
△ Less
Submitted 30 October, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking
Authors:
Ivana Beňová,
Jana Košecká,
Michal Gregor,
Martin Tamajka,
Marcel Veselý,
Marián Šimko
Abstract:
The dominant probing approaches rely on the zero-shot performance of image-text matching tasks to gain a finer-grained understanding of the representations learned by recent multimodal image-language transformer models. The evaluation is carried out on carefully curated datasets focusing on counting, relations, attributes, and others. This work introduces an alternative probing strategy called gui…
▽ More
The dominant probing approaches rely on the zero-shot performance of image-text matching tasks to gain a finer-grained understanding of the representations learned by recent multimodal image-language transformer models. The evaluation is carried out on carefully curated datasets focusing on counting, relations, attributes, and others. This work introduces an alternative probing strategy called guided masking. The proposed approach ablates different modalities using masking and assesses the model's ability to predict the masked word with high accuracy. We focus on studying multimodal models that consider regions of interest (ROI) features obtained by object detectors as input tokens. We probe the understanding of verbs using guided masking on ViLBERT, LXMERT, UNITER, and VisualBERT and show that these models can predict the correct verb with high accuracy. This contrasts with previous conclusions drawn from image-text matching probing techniques that frequently fail in situations requiring verb understanding. The code for all experiments will be publicly available https://github.com/ivana-13/guided_masking.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling
Authors:
Matúš Pikuliak,
Andrea Hrckova,
Stefan Oresko,
Marián Šimko
Abstract:
We present GEST -- a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. GEST contains samples for 16 gender stereotypes about men and women (e.g., Women are beautiful, Men are leaders) that are compatible with the English language and 9 Slavic languages. The definition of said stereotypes was informed by gender expert…
▽ More
We present GEST -- a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. GEST contains samples for 16 gender stereotypes about men and women (e.g., Women are beautiful, Men are leaders) that are compatible with the English language and 9 Slavic languages. The definition of said stereotypes was informed by gender experts. We used GEST to evaluate English and Slavic masked LMs, English generative LMs, and machine translation systems. We discovered significant and consistent amounts of gender-stereotypical reasoning in almost all the evaluated models and languages. Our experiments confirm the previously postulated hypothesis that the larger the model, the more stereotypical it usually is.
△ Less
Submitted 30 September, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Average Is Not Enough: Caveats of Multilingual Evaluation
Authors:
Matúš Pikuliak,
Marián Šimko
Abstract:
This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language families into evaluation methodology. We argue that a qualitative analysis informed by comparative linguistics is needed for multilingual results to detect this kind of bias. We show in our case study that…
▽ More
This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language families into evaluation methodology. We argue that a qualitative analysis informed by comparative linguistics is needed for multilingual results to detect this kind of bias. We show in our case study that results in published works can indeed be linguistically biased and we demonstrate that visualization based on URIEL typological database can detect it.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
SlovakBERT: Slovak Masked Language Model
Authors:
Matúš Pikuliak,
Štefan Grivalský,
Martin Konôpka,
Miroslav Blšták,
Martin Tamajka,
Viktor Bachratý,
Marián Šimko,
Pavol Balážik,
Michal Trnka,
Filip Uhlárik
Abstract:
We introduce a new Slovak masked language model called SlovakBERT. This is to our best knowledge the first paper discussing Slovak transformers-based language models. We evaluate our model on several NLP tasks and achieve state-of-the-art results. This evaluation is likewise the first attempt to establish a benchmark for Slovak language models. We publish the masked language model, as well as the…
▽ More
We introduce a new Slovak masked language model called SlovakBERT. This is to our best knowledge the first paper discussing Slovak transformers-based language models. We evaluate our model on several NLP tasks and achieve state-of-the-art results. This evaluation is likewise the first attempt to establish a benchmark for Slovak language models. We publish the masked language model, as well as the fine-tuned models for part-of-speech tagging, sentiment analysis and semantic textual similarity.
△ Less
Submitted 29 October, 2022; v1 submitted 30 September, 2021;
originally announced September 2021.
-
NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining
Authors:
Samuel Pecar,
Marian Simko,
Maria Bielikova
Abstract:
In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: "Suggestion Mining from Online Reviews and Forums". We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attention mechanism. Our architecture tries to encode word…
▽ More
In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: "Suggestion Mining from Online Reviews and Forums". We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attention mechanism. Our architecture tries to encode words via word representations using ELMo and ensembles multiple models to achieve better results. We performed experiments with different setups of our proposed model involving weighting of prediction classes for loss function. Our best model achieved in official test evaluation score of 0.6816 for subtask A and 0.6850 for subtask B. In official results, we achieved 12th and 10th place in subtasks A and B, respectively.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Improving Moderation of Online Discussions via Interpretable Neural Models
Authors:
Andrej Švec,
Matúš Pikuliak,
Marián Šimko,
Mária Bieliková
Abstract:
Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we…
▽ More
Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data from a major Slovak news discussion platform.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Pushing the Limits of LTE: A Survey on Research Enhancing the Standard
Authors:
Stefan Schwarz,
Josep Colom Ikuno,
Michal Šimko,
Martin Taranetz,
Qi Wang,
Markus Rupp
Abstract:
Cellular networks are an essential part of todays communication infrastructure. The ever-increasing demand for higher data-rates calls for a close cooperation between researchers and industry/standardization experts which hardly exists in practice. In this article we give an overview about our efforts in trying to bridge this gap. Our research group provides a standard-compliant open-source simula…
▽ More
Cellular networks are an essential part of todays communication infrastructure. The ever-increasing demand for higher data-rates calls for a close cooperation between researchers and industry/standardization experts which hardly exists in practice. In this article we give an overview about our efforts in trying to bridge this gap. Our research group provides a standard-compliant open-source simulation platform for 3GPP LTE that enables reproducible research in a well-defined environment. We demonstrate that much innovative research under the confined framework of a real-world standard is still possible, sometimes even encouraged. With examplary samples of our research work we investigate on the potential of several important research areas under typical practical conditions.
△ Less
Submitted 11 June, 2013; v1 submitted 30 December, 2012;
originally announced December 2012.