Skip to main content

Showing 1–1 of 1 results for author: Berestova, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.24616  [pdf, ps, other

    cs.CL cs.AI

    Eye of Judgement: Dissecting the Evaluation of Russian-speaking LLMs with POLLUX

    Authors: Nikita Martynov, Anastasia Mordasheva, Dmitriy Gorbetskiy, Danil Astafurov, Ulyana Isaeva, Elina Basyrova, Sergey Skachkov, Victoria Berestova, Nikolay Ivanov, Valeriia Zanina, Alena Fenogenova

    Abstract: We introduce POLLUX, a comprehensive open-source benchmark designed to evaluate the generative capabilities of large language models (LLMs) in Russian. Our main contribution is a novel evaluation methodology that enhances the interpretability of LLM assessment. For each task type, we define a set of detailed criteria and develop a scoring protocol where models evaluate responses and provide justif… ▽ More

    Submitted 23 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

    Comments: 179 pages