Skip to main content

Showing 1–10 of 10 results for author: Schmidtová, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.11508  [pdf, ps, other

    cs.CL

    Real-World Summarization: When Evaluation Reaches Its Limits

    Authors: Patrícia Schmidtová, Ondřej Dušek, Saad Mahamood

    Abstract: We examine evaluation of faithfulness to input data in the context of hotel highlights: brief LLM-generated summaries that capture unique features of accommodations. Through human evaluation campaigns involving categorical error assessment and span-level annotation, we compare traditional metrics, trainable methods, and LLM-as-a-judge approaches. Our findings reveal that simpler metrics like word… ▽ More

    Submitted 15 July, 2025; originally announced July 2025.

  2. arXiv:2507.09509  [pdf, ps, other

    cs.CL

    How Important is `Perfect' English for Machine Translation Prompts?

    Authors: Patrícia Schmidtová, Niyati Bafna, Seth Aycock, Gianluca Vico, Wiktor Kamzela, Katharina Hämmerl, Vilém Zouhar

    Abstract: Large language models (LLMs) have achieved top results in recent machine translation evaluations, but they are also known to be sensitive to errors and perturbations in their prompts. We systematically evaluate how both humanly plausible and synthetic errors in user prompts affect LLMs' performance on two related tasks: Machine translation and machine translation evaluation. We provide both a quan… ▽ More

    Submitted 30 August, 2025; v1 submitted 13 July, 2025; originally announced July 2025.

  3. arXiv:2504.08697  [pdf, ps, other

    cs.CL

    Large Language Models as Span Annotators

    Authors: Zdeněk Kasner, Vilém Zouhar, Patrícia Schmidtová, Ivan Kartáč, Kristýna Onderková, Ondřej Plátek, Dimitra Gkatzia, Saad Mahamood, Ondřej Dušek, Simone Balloccu

    Abstract: Span annotation is the task of localizing and classifying text spans according to custom guidelines. Annotated spans can be used to analyze and evaluate high-quality texts for which single-score metrics fail to provide actionable feedback. Until recently, span annotation was limited to human annotators or fine-tuned models. In this study, we show that large language models (LLMs) can serve as flex… ▽ More

    Submitted 24 June, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

  4. arXiv:2408.09169  [pdf, other

    cs.CL

    Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

    Authors: Patrícia Schmidtová, Saad Mahamood, Simone Balloccu, Ondřej Dušek, Albert Gatt, Dimitra Gkatzia, David M. Howcroft, Ondřej Plátek, Adarsa Sivaprasad

    Abstract: Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a survey on the use of automatic metrics, focusing particularly on natural language generation (NLG) tasks. We inspect which metrics are used as well as why they are cho… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to INLG 2024

  5. arXiv:2407.17863  [pdf, other

    cs.CL

    factgenie: A Framework for Span-based Evaluation of Generated Texts

    Authors: Zdeněk Kasner, Ondřej Plátek, Patrícia Schmidtová, Simone Balloccu, Ondřej Dušek

    Abstract: We present factgenie: a framework for annotating and visualizing word spans in textual model outputs. Annotations can capture various span-based phenomena such as semantic inaccuracies or irrelevant text. With factgenie, the annotations can be collected both from human crowdworkers and large language models. Our framework consists of a web interface for data visualization and gathering text annota… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted to INLG 2024 (System Demonstrations)

  6. arXiv:2402.03927  [pdf, other

    cs.CL cs.AI

    Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

    Authors: Simone Balloccu, Patrícia Schmidtová, Mateusz Lango, Ondřej Dušek

    Abstract: Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but… ▽ More

    Submitted 22 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted at EACL 2024 - main conference

  7. arXiv:2308.06502  [pdf, other

    cs.CL cs.AI

    Three Ways of Using Large Language Models to Evaluate Chat

    Authors: Ondřej Plátek, Vojtěch Hudeček, Patricia Schmidtová, Mateusz Lango, Ondřej Dušek

    Abstract: This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition. We present three different approaches to predicting turn-level qualities of chatbot responses based on large language models (LLMs). We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT. We also analyze the performance of the other tw… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Accepted to DSTC11 workshop https://dstc11.dstc.community/

  8. arXiv:2206.08425  [pdf, other

    cs.CL

    DialogueScript: Using Dialogue Agents to Produce a Script

    Authors: Patrícia Schmidtová, Dávid Javorský, Christián Mikláš, Tomáš Musil, Rudolf Rosa, Ondřej Dušek

    Abstract: We present a novel approach to generating scripts by using agents with different personality types. To manage character interaction in the script, we employ simulated dramatic networks. Automatic and human evaluation on multiple criteria shows that our approach outperforms a vanilla-GPT2-based baseline. We further introduce a new metric to evaluate dialogue consistency based on natural language in… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: Non-archival paper at the 4th Workshop on Narrative Understanding (WNU 2022)

  9. arXiv:2102.08892  [pdf, ps, other

    cs.CL cs.HC

    THEaiTRE 1.0: Interactive generation of theatre play scripts

    Authors: Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka

    Abstract: We present the first version of a system for interactive generation of theatre play scripts. The system is based on a vanilla GPT-2 model with several adjustments, targeting specific issues we encountered in practice. We also list other issues we encountered but plan to only solve in a future version of the system. The presented system was used to generate a theatre play script planned for premier… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Comments: Submitted to Text2Story workshop 2021

    Journal ref: Proc. Text2Story (2021) 71-76

  10. arXiv:2006.14668  [pdf, ps, other

    cs.CL

    THEaiTRE: Artificial Intelligence to Write a Theatre Play

    Authors: Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká

    Abstract: We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts. This paper reviews related work and drafts an approach we intend to follow. We plan to adopt generative neural language models and hierarchical generation approaches, supported by summarization and machine translation methods, and complemented with a human-in-the-loop approach.

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: accepted to AI4Narratives2020

    Journal ref: Proc. AI4Narratives (2020) 9-13