Skip to main content

Showing 1–12 of 12 results for author: Šimko, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21508  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG

    skLEP: A Slovak General Language Understanding Benchmark

    Authors: Marek Šuppa, Andrej Ridzik, Daniel Hládek, Tomáš Javůrek, Viktória Ondrejová, Kristína Sásiková, Martin Tamajka, Marián Šimko

    Abstract: In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datase… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings

    MSC Class: 68T50 ACM Class: I.2.7

  2. arXiv:2505.10740  [pdf, ps, other

    cs.CL cs.IR

    SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

    Authors: Qiwei Peng, Robert Moro, Michal Gregor, Ivan Srba, Simon Ostermann, Marian Simko, Juraj Podroužek, Matúš Mesarčík, Jaroslav Kopčan, Anders Søgaard

    Abstract: The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountere… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2503.02737  [pdf, ps, other

    cs.CL

    Large Language Models for Multilingual Previously Fact-Checked Claim Detection

    Authors: Ivan Vykopal, Matúš Pikuliak, Simon Ostermann, Tatiana Anikina, Michal Gregor, Marián Šimko

    Abstract: In our era of widespread false information, human fact-checkers often face the challenge of duplicating efforts when verifying claims that may have already been addressed in other countries or languages. As false information transcends linguistic boundaries, the ability to automatically detect previously fact-checked claims across languages has become an increasingly important task. This paper pre… ▽ More

    Submitted 12 June, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  4. arXiv:2407.02351  [pdf, other

    cs.CL

    Generative Large Language Models in Automated Fact-Checking: A Survey

    Authors: Ivan Vykopal, Matúš Pikuliak, Simon Ostermann, Marián Šimko

    Abstract: The dissemination of false information on online platforms presents a serious societal challenge. While manual fact-checking remains crucial, Large Language Models (LLMs) offer promising opportunities to support fact-checkers with their vast knowledge and advanced reasoning capabilities. This survey explores the application of generative LLMs in fact-checking, highlighting various approaches and t… ▽ More

    Submitted 30 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  5. Soft Language Prompts for Language Transfer

    Authors: Ivan Vykopal, Simon Ostermann, Marián Šimko

    Abstract: Cross-lingual knowledge transfer, especially between high- and low-resource languages, remains challenging in natural language processing (NLP). This study offers insights for improving cross-lingual NLP applications through the combination of parameter-efficient fine-tuning methods. We systematically explore strategies for enhancing cross-lingual transfer through the incorporation of language-spe… ▽ More

    Submitted 30 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Journal ref: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

  6. arXiv:2401.16575  [pdf, other

    cs.CL cs.CV

    Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking

    Authors: Ivana Beňová, Jana Košecká, Michal Gregor, Martin Tamajka, Marcel Veselý, Marián Šimko

    Abstract: The dominant probing approaches rely on the zero-shot performance of image-text matching tasks to gain a finer-grained understanding of the representations learned by recent multimodal image-language transformer models. The evaluation is carried out on carefully curated datasets focusing on counting, relations, attributes, and others. This work introduces an alternative probing strategy called gui… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 9 pages of text, 11 pages total, 7 figures, 3 tables, preprint

  7. arXiv:2311.18711  [pdf, other

    cs.CL

    Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

    Authors: Matúš Pikuliak, Andrea Hrckova, Stefan Oresko, Marián Šimko

    Abstract: We present GEST -- a new manually created dataset designed to measure gender-stereotypical reasoning in language models and machine translation systems. GEST contains samples for 16 gender stereotypes about men and women (e.g., Women are beautiful, Men are leaders) that are compatible with the English language and 9 Slavic languages. The definition of said stereotypes was informed by gender expert… ▽ More

    Submitted 30 September, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: EMNLP 2024 Findings

  8. arXiv:2301.01269  [pdf, other

    cs.CL

    Average Is Not Enough: Caveats of Multilingual Evaluation

    Authors: Matúš Pikuliak, Marián Šimko

    Abstract: This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language families into evaluation methodology. We argue that a qualitative analysis informed by comparative linguistics is needed for multilingual results to detect this kind of bias. We show in our case study that… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: The 2022 Workshop on Multilingual Representation Learning

  9. arXiv:2109.15254  [pdf, other

    cs.CL

    SlovakBERT: Slovak Masked Language Model

    Authors: Matúš Pikuliak, Štefan Grivalský, Martin Konôpka, Miroslav Blšták, Martin Tamajka, Viktor Bachratý, Marián Šimko, Pavol Balážik, Michal Trnka, Filip Uhlárik

    Abstract: We introduce a new Slovak masked language model called SlovakBERT. This is to our best knowledge the first paper discussing Slovak transformers-based language models. We evaluate our model on several NLP tasks and achieve state-of-the-art results. This evaluation is likewise the first attempt to establish a benchmark for Slovak language models. We publish the masked language model, as well as the… ▽ More

    Submitted 29 October, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: 12 pages, 2 figures

  10. arXiv:1904.02981  [pdf, other

    cs.CL

    NL-FIIT at SemEval-2019 Task 9: Neural Model Ensemble for Suggestion Mining

    Authors: Samuel Pecar, Marian Simko, Maria Bielikova

    Abstract: In this paper, we present neural model architecture submitted to the SemEval-2019 Task 9 competition: "Suggestion Mining from Online Reviews and Forums". We participated in both subtasks for domain specific and also cross-domain suggestion mining. We proposed a recurrent neural network architecture that employs Bi-LSTM layers and also self-attention mechanism. Our architecture tries to encode word… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

    Comments: Accepted at the SemEval-2019 International Workshop on Semantic Evaluation

  11. arXiv:1809.06906  [pdf, other

    cs.CL

    Improving Moderation of Online Discussions via Interpretable Neural Models

    Authors: Andrej Švec, Matúš Pikuliak, Marián Šimko, Mária Bieliková

    Abstract: Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

    Comments: ALW2

  12. Pushing the Limits of LTE: A Survey on Research Enhancing the Standard

    Authors: Stefan Schwarz, Josep Colom Ikuno, Michal Šimko, Martin Taranetz, Qi Wang, Markus Rupp

    Abstract: Cellular networks are an essential part of todays communication infrastructure. The ever-increasing demand for higher data-rates calls for a close cooperation between researchers and industry/standardization experts which hardly exists in practice. In this article we give an overview about our efforts in trying to bridge this gap. Our research group provides a standard-compliant open-source simula… ▽ More

    Submitted 11 June, 2013; v1 submitted 30 December, 2012; originally announced December 2012.

    Comments: The final version of the manuscript is available at: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6514821&isnumber=6336544

    Journal ref: Schwarz, S.; Ikuno, J.C.; Simko, M.; Taranetz, M.; Wang, Q.; Rupp, M., "Pushing the Limits of LTE: A Survey on Research Enhancing the Standard," Access, IEEE , vol.1, no., pp.51,62, 2013