Skip to main content

Showing 1–9 of 9 results for author: Kubis, M

.
  1. arXiv:2504.20581  [pdf, other

    cs.CL

    ClonEval: An Open Voice Cloning Benchmark

    Authors: Iwona Christop, Tomasz Kuczyński, Marek Kubis

    Abstract: We present a novel benchmark for voice cloning text-to-speech models. The benchmark consists of an evaluation protocol, an open-source library for assessing the performance of voice cloning models, and an accompanying leaderboard. The paper discusses design considerations and presents a detailed description of the evaluation procedure. The usage of the software library is explained, along with the… ▽ More

    Submitted 28 May, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: Under review at NeurIPS

  2. arXiv:2501.02266  [pdf, other

    cs.CL cs.AI

    LLMzSzŁ: a comprehensive LLM benchmark for Polish

    Authors: Krzysztof Jassem, Michał Ciesiółka, Filip Graliński, Piotr Jabłoński, Jakub Pokrywka, Marek Kubis, Monika Jabłońska, Ryszard Staruch

    Abstract: This article introduces the first comprehensive benchmark for the Polish language at this scale: LLMzSzŁ (LLMs Behind the School Desk). It is based on a coherent collection of Polish national exams, including both academic and professional tests extracted from the archives of the Polish Central Examination Board. It covers 4 types of exams, coming from 154 domains. Altogether, it consists of almos… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

  3. arXiv:2412.00559  [pdf, other

    cs.CL cs.AI

    Polish Medical Exams: A new dataset for cross-lingual medical knowledge transfer assessment

    Authors: Łukasz Grzybowski, Jakub Pokrywka, Michał Ciesiółka, Jeremi I. Kaczmarek, Marek Kubis

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in handling specialized tasks, including medical problem-solving. However, most studies predominantly focus on English-language contexts. This study introduces a novel benchmark dataset based on Polish medical licensing and specialization exams (LEK, LDEK, PES) taken by medical doctor candidates and practicing doctors pursuing sp… ▽ More

    Submitted 30 November, 2024; originally announced December 2024.

  4. arXiv:2407.01393  [pdf, other

    cs.CL

    POLygraph: Polish Fake News Dataset

    Authors: Daniel Dzienisiewicz, Filip Graliński, Piotr Jabłoński, Marek Kubis, Paweł Skórzewski, Piotr Wierzchoń

    Abstract: This paper presents the POLygraph dataset, a unique resource for fake news detection in Polish. The dataset, created by an interdisciplinary team, is composed of two parts: the "fake-or-not" dataset with 11,360 pairs of news articles (identified by their URLs) and corresponding labels, and the "fake-they-say" dataset with 5,082 news articles (identified by their URLs) and tweets commenting on them… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 14 pages, 1 figure, accepted to the 14th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA'24)

  5. Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech

    Authors: Mateusz Czyżnikiewicz, Łukasz Bondaruk, Jakub Kubiak, Adam Wiącek, Łukasz Degórski, Marek Kubis, Paweł Skórzewski

    Abstract: In this paper we study the impact of augmenting spoken language corpora with domain-specific synthetic samples for the purpose of training a speech recognition system. Using both a conventional neural TTS system and a zero-shot one with voice cloning ability we generate speech corpora that vary in the number of voices. We compare speech recognition models trained with addition of different amounts… ▽ More

    Submitted 29 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to FedCSIS 2024

  6. arXiv:2402.01300  [pdf, other

    cs.CL

    Two Approaches to Diachronic Normalization of Polish Texts

    Authors: Kacper Dudzic, Filip Graliński, Krzysztof Jassem, Marek Kubis, Piotr Wierzchoń

    Abstract: This paper discusses two approaches to the diachronic normalization of Polish texts: a rule-based solution that relies on a set of handcrafted patterns, and a neural normalization model based on the text-to-text transfer transformer architecture. The training and evaluation data prepared for the task are discussed in detail, along with experiments conducted to compare the proposed normalization so… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to the LaTeCH-CLfL 2024 workshop

  7. arXiv:2310.16609  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors

    Authors: Marek Kubis, Paweł Skórzewski, Marcin Sowański, Tomasz Ziętkiewicz

    Abstract: In a spoken dialogue system, an NLU model is preceded by a speech recognition system that can deteriorate the performance of natural language understanding. This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models. The proposed method combines the back transcription procedure with a fine-grained technique for… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 main conference

  8. arXiv:2001.03041  [pdf, other

    cs.CL cs.AI

    Open Challenge for Correcting Errors of Speech Recognition Systems

    Authors: Marek Kubis, Zygmunt Vetulani, Mikołaj Wypych, Tomasz Ziętkiewicz

    Abstract: The paper announces the new long-term challenge for improving the performance of automatic speech recognition systems. The goal of the challenge is to investigate methods of correcting the recognition results on the basis of previously made errors by the speech processing system. The dataset prepared for the task is described and evaluation criteria are presented.

    Submitted 9 January, 2020; originally announced January 2020.

    Journal ref: Vetulani, Zygmunt, Paroubek, Patrick (eds.): Proceedings of the 9th Language and Technology Conference, pp. 219-223, Wydawnictwo Nauka i Innowacje, Poznań, Poland, 2019

  9. arXiv:1705.05621  [pdf

    cond-mat.mtrl-sci

    Application of flash method in the measurements of interfacial thermal resistance in layered and particulate composite materials

    Authors: Karol Pietrak, Tomasz S. Wiśniewski, Michał Kubiś

    Abstract: Presented study concerns the possibility of evaluation of interfacial thermal resistance (ITR) between the constituents in composite materials with the use of flash technique. Two variants of such measurement are considered, the first of which is the measurement of ITR between two bonded layers of different materials which had been studied before by various researchers. The second tested measureme… ▽ More

    Submitted 16 May, 2017; originally announced May 2017.

    Comments: 28 pages, 12 figures, 6 tables, submitted to Thermochimica acta