GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture
Authors:
GigaChat team,
Mamedov Valentin,
Evgenii Kosarev,
Gregory Leleytner,
Ilya Shchuckin,
Valeriy Berezovskiy,
Daniil Smirnov,
Dmitry Kozlov,
Sergei Averkiev,
Lukyanenko Ivan,
Aleksandr Proshunin,
Ainur Israfilova,
Ivan Baskov,
Artem Chervyakov,
Emil Shakirov,
Mikhail Kolesov,
Daria Khomich,
Darya Latortseva,
Sergei Porkhun,
Yury Fedorov,
Oleg Kutuzov,
Polina Kudriavtseva,
Sofiia Soldatova,
Kolodin Egor,
Stanislav Pyatkin
, et al. (9 additional authors not shown)
Abstract:
Generative large language models (LLMs) have become crucial for modern NLP research and applications across various languages. However, the development of foundational models specifically tailored to the Russian language has been limited, primarily due to the significant computational resources required. This paper introduces the GigaChat family of Russian LLMs, available in various sizes, includi…
▽ More
Generative large language models (LLMs) have become crucial for modern NLP research and applications across various languages. However, the development of foundational models specifically tailored to the Russian language has been limited, primarily due to the significant computational resources required. This paper introduces the GigaChat family of Russian LLMs, available in various sizes, including base models and instruction-tuned versions. We provide a detailed report on the model architecture, pre-training process, and experiments to guide design choices. In addition, we evaluate their performance on Russian and English benchmarks and compare GigaChat with multilingual analogs. The paper presents a system demonstration of the top-performing models accessible via an API, a Telegram bot, and a Web interface. Furthermore, we have released three open GigaChat models in open-source (https://huggingface.co/ai-sage), aiming to expand NLP research opportunities and support the development of industrial solutions for the Russian language.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
MERA: A Comprehensive LLM Evaluation in Russian
Authors:
Alena Fenogenova,
Artem Chervyakov,
Nikita Martynov,
Anastasia Kozlova,
Maria Tikhonova,
Albina Akhmetgareeva,
Anton Emelyanov,
Denis Shevelev,
Pavel Lebedev,
Leonid Sinev,
Ulyana Isaeva,
Katerina Kolomeytseva,
Daniil Moskovskiy,
Elizaveta Goncharova,
Nikita Savushkin,
Polina Mikhailova,
Denis Dimitrov,
Alexander Panchenko,
Sergei Markov
Abstract:
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitatio…
▽ More
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks.
△ Less
Submitted 2 August, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.