Skip to main content

Showing 1–26 of 26 results for author: Chirkova, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.23588  [pdf, ps, other

    cs.CL

    DiffLoRA: Differential Low-Rank Adapters for Large Language Models

    Authors: Alexandre Misrahi, Nadezhda Chirkova, Maxime Louis, Vassilina Nikoulina

    Abstract: Differential Transformer has recently been proposed to improve performance in Transformer models by canceling out noise through a denoiser attention mechanism. In this work, we introduce DiffLoRA, a parameter-efficient adaptation of the differential attention mechanism, with low-rank adapters on both positive and negative attention terms. This approach retains the efficiency of LoRA while aiming t… ▽ More

    Submitted 31 July, 2025; originally announced July 2025.

  2. arXiv:2506.09147  [pdf, ps, other

    cs.CL cs.AI

    LLM-as-a-qualitative-judge: automating error analysis in natural language generation

    Authors: Nadezhda Chirkova, Tunde Oluwaseyi Ajayi, Seth Aycock, Zain Muhammad Mujahid, Vladana Perlić, Ekaterina Borisova, Markarit Vartampetian

    Abstract: Prompting large language models (LLMs) to evaluate generated text, known as LLM-as-a-judge, has become a standard evaluation approach in natural language generation (NLG), but is primarily used as a quantitative tool, i.e. with numerical scores as main outputs. In this work, we propose LLM-as-a-qualitative-judge, an LLM-based evaluation approach with the main output being a structured report of co… ▽ More

    Submitted 29 July, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  3. arXiv:2504.02411  [pdf, other

    cs.CL

    Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation

    Authors: Alexandre Misrahi, Nadezhda Chirkova, Maxime Louis, Vassilina Nikoulina

    Abstract: Retrieval-Augmented Generation (RAG) enhances LLM factuality, but multi-domain applications face challenges like lack of diverse benchmarks and poor out-of-domain generalization. The first contribution of this work is to introduce a diverse benchmark comprising a variety of question-answering tasks from 8 sources and covering 13 domains. Our second contribution consists in systematically testing o… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 25 pages, 8 figures, 21 tables

  4. arXiv:2501.16214  [pdf, other

    cs.CL cs.IR

    Provence: efficient and robust context pruning for retrieval-augmented generation

    Authors: Nadezhda Chirkova, Thibault Formal, Vassilina Nikoulina, Stéphane Clinchant

    Abstract: Retrieval-augmented generation improves various aspects of large language models (LLMs) generation, but suffers from computational overhead caused by long contexts as well as the propagation of irrelevant retrieved information into generated responses. Context pruning deals with both aspects, by removing irrelevant parts of retrieved contexts before LLM generation. Existing context pruning approac… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

    Comments: Accepted to ICLR 2025

  5. arXiv:2407.01463  [pdf, other

    cs.CL cs.AI

    Retrieval-augmented generation in multilingual settings

    Authors: Nadezhda Chirkova, David Rau, Hervé Déjean, Thibault Formal, Stéphane Clinchant, Vassilina Nikoulina

    Abstract: Retrieval-augmented generation (RAG) has recently emerged as a promising solution for incorporating up-to-date or domain-specific knowledge into large language models (LLMs) and improving LLM factuality, but is predominantly studied in English-only settings. In this work, we consider RAG in the multilingual setting (mRAG), i.e. with user queries and the datastore in 13 languages, and investigate w… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2407.01126  [pdf, other

    cs.CL cs.AI

    Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation

    Authors: Nadezhda Chirkova, Vassilina Nikoulina, Jean-Luc Meunier, Alexandre Bérard

    Abstract: We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling, which helps to accommodate a variety of multi-domain data,… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2407.01102  [pdf, other

    cs.CL cs.IR

    BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

    Authors: David Rau, Hervé Déjean, Nadezhda Chirkova, Thibault Formal, Shuai Wang, Vassilina Nikoulina, Stéphane Clinchant

    Abstract: Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. In response to the recent popularity of generative LLMs, many RAG approaches have been proposed, which involve an intricate number of different configurations such as evaluation datasets, collections, metrics, retrievers, and LLMs. Inconsistent benchmarking poses a major challenge in comparing approache… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 29 pages

  8. arXiv:2402.14778  [pdf, other

    cs.CL cs.AI

    Zero-shot cross-lingual transfer in instruction tuning of large language models

    Authors: Nadezhda Chirkova, Vassilina Nikoulina

    Abstract: Instruction tuning (IT) is widely used to teach pretrained large language models (LLMs) to follow arbitrary instructions, but is under-studied in multilingual settings. In this work, we conduct a systematic study of zero-shot cross-lingual transfer in IT, when an LLM is instruction-tuned on English-only data and then tested on user prompts in other languages. We advocate for the importance of eval… ▽ More

    Submitted 22 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  9. arXiv:2402.12279  [pdf, other

    cs.CL cs.AI

    Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks

    Authors: Nadezhda Chirkova, Vassilina Nikoulina

    Abstract: Zero-shot cross-lingual knowledge transfer enables a multilingual pretrained language model, finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approach… ▽ More

    Submitted 22 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: NAACL 2024 final version. arXiv admin note: text overlap with arXiv:2310.09917

  10. arXiv:2310.09917  [pdf, other

    cs.CL

    Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation

    Authors: Nadezhda Chirkova, Sheng Liang, Vassilina Nikoulina

    Abstract: Zero-shot cross-lingual knowledge transfer enables the multilingual pretrained language model (mPLM), finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose… ▽ More

    Submitted 22 April, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: This preprint describes a preliminary study for our follow-up work arXiv:2402.12279 (NAACL 2024), in which we investigate important factors for enabling zero-shot cross-lingual transfer in generative tasks

  11. arXiv:2308.00683  [pdf, other

    cs.LG cs.CL cs.SE

    CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

    Authors: Nadezhda Chirkova, Sergey Troshin

    Abstract: Recent works have widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code. This work investigates another important aspect of such models, namely the effect of different subtokenization options, and aims at identifying most effe… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: Published at ICLR 2023

  12. arXiv:2306.17757  [pdf, other

    cs.CL

    Should you marginalize over possible tokenizations?

    Authors: Nadezhda Chirkova, Germán Kruszewski, Jos Rozen, Marc Dymetman

    Abstract: Autoregressive language models (LMs) map token sequences to probabilities. The usual practice for computing the probability of any character string (e.g. English sentences) is to first transform it into a sequence of tokens that is scored by the model. However, there are exponentially many token sequences that represent any given string. To truly compute the probability of a string one should marg… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023

  13. arXiv:2212.05901  [pdf, other

    cs.CL cs.LG cs.SE

    Parameter-Efficient Finetuning of Transformers for Source Code

    Authors: Shamil Ayupov, Nadezhda Chirkova

    Abstract: Pretrained Transformers achieve state-of-the-art performance in various code-processing tasks but may be too large to be deployed. As software development tools often incorporate modules for various purposes which may potentially use a single instance of the pretrained model, it appears relevant to utilize parameter-efficient fine-tuning for the pretrained models of code. In this work, we test two… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  14. arXiv:2202.08975  [pdf, other

    cs.SE cs.CL cs.LG

    Probing Pretrained Models of Source Code

    Authors: Sergey Troshin, Nadezhda Chirkova

    Abstract: Deep learning models are widely used for solving challenging code processing tasks, such as code generation or code summarization. Traditionally, a specific model architecture was carefully built to solve a particular code processing task. However, recently general pretrained models such as CodeBERT or CodeT5 have been shown to outperform task-specific models in many applications. While pretrained… ▽ More

    Submitted 17 November, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

  15. arXiv:2112.14423  [pdf, other

    eess.SP cs.LG cs.NI

    Machine Learning Methods for Spectral Efficiency Prediction in Massive MIMO Systems

    Authors: Evgeny Bobrov, Sergey Troshin, Nadezhda Chirkova, Ekaterina Lobacheva, Sviatoslav Panchenko, Dmitry Vetrov, Dmitry Kropotov

    Abstract: Channel decoding, channel detection, channel assessment, and resource management for wireless multiple-input multiple-output (MIMO) systems are all examples of problems where machine learning (ML) can be successfully applied. In this paper, we study several ML approaches to solve the problem of estimating the spectral efficiency (SE) value for a certain precoding scheme, preferably in the shortest… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: To appear in Optimization Methods & Software, 22 pages, 10 figures, 2 tables

  16. arXiv:2107.10143  [pdf, other

    cs.LG stat.ML

    On the Memorization Properties of Contrastive Learning

    Authors: Ildus Sadrtdinov, Nadezhda Chirkova, Ekaterina Lobacheva

    Abstract: Memorization studies of deep neural networks (DNNs) help to understand what patterns and how do DNNs learn, and motivate improvements to DNN training approaches. In this work, we investigate the memorization properties of SimCLR, a widely used contrastive self-supervised learning approach, and compare them to the memorization of supervised learning and random labels training. We find that both tra… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

    Comments: Published in Workshop on Overparameterization: Pitfalls & Opportunities at ICML 2021

  17. arXiv:2106.15739  [pdf, other

    cs.LG stat.ML

    On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

    Authors: Ekaterina Lobacheva, Maxim Kodryan, Nadezhda Chirkova, Andrey Malinin, Dmitry Vetrov

    Abstract: Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however, do not lead to complete divergence but cause a new period of training. We rigorously investigate t… ▽ More

    Submitted 15 January, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: Published in NeurIPS 2021. First two authors contributed equally

  18. arXiv:2010.12693  [pdf, other

    cs.SE cs.CL cs.LG

    On the Embeddings of Variables in Recurrent Neural Networks for Source Code

    Authors: Nadezhda Chirkova

    Abstract: Source code processing heavily relies on the methods widely used in natural language processing (NLP), but involves specifics that need to be taken into account to achieve higher quality. An example of this specificity is that the semantics of a variable is defined not only by its name but also by the contexts in which the variable occurs. In this work, we develop dynamic embeddings, a recurrent m… ▽ More

    Submitted 27 April, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Published at the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021)

  19. arXiv:2010.12663  [pdf, other

    cs.SE cs.LG

    A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

    Authors: Nadezhda Chirkova, Sergey Troshin

    Abstract: There is an emerging interest in the application of natural language processing models to source code processing tasks. One of the major problems in applying deep learning to software engineering is that source code often contains a lot of rare identifiers, resulting in huge vocabularies. We propose a simple, yet effective method, based on identifier anonymization, to handle out-of-vocabulary (OOV… ▽ More

    Submitted 27 April, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Published at the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021)

  20. arXiv:2010.07987  [pdf, other

    cs.LG cs.CL cs.SE

    Empirical Study of Transformers for Source Code

    Authors: Nadezhda Chirkova, Sergey Troshin

    Abstract: Initially developed for natural language processing (NLP), Transformers are now widely used for source code processing, due to the format similarity between source code and text. In contrast to natural language, source code is strictly structured, i.e., it follows the syntax of the programming language. Several recent works develop Transformer modifications for capturing syntactic information in s… ▽ More

    Submitted 24 June, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Published at the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2021 (ESEC/FSE'21)

  21. arXiv:2007.08483  [pdf, other

    cs.LG stat.ML

    On Power Laws in Deep Ensembles

    Authors: Ekaterina Lobacheva, Nadezhda Chirkova, Maxim Kodryan, Dmitry Vetrov

    Abstract: Ensembles of deep neural networks are known to achieve state-of-the-art performance in uncertainty estimation and lead to accuracy improvement. In this work, we focus on a classification problem and investigate the behavior of both non-calibrated and calibrated negative log-likelihood (CNLL) of a deep ensemble as a function of the ensemble size and the member network size. We indicate the conditio… ▽ More

    Submitted 28 June, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Published in NeurIPS 2020 and Workshop on Uncertainty and Robustness in Deep Learning at ICML 2020

  22. arXiv:2005.07292  [pdf, other

    cs.LG stat.ML

    Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?

    Authors: Nadezhda Chirkova, Ekaterina Lobacheva, Dmitry Vetrov

    Abstract: One of the generally accepted views of modern deep learning is that increasing the number of parameters usually leads to better quality. The two easiest ways to increase the number of parameters is to increase the size of the network, e.g. width, or to train a deep ensemble; both approaches improve the performance in practice. In this work, we consider a fixed memory budget setting, and investigat… ▽ More

    Submitted 14 May, 2020; originally announced May 2020.

    Comments: Under review by the International Conference on Machine Learning (ICML 2020)

  23. arXiv:1911.05585  [pdf, other

    cs.LG cs.CL stat.ML

    Structured Sparsification of Gated Recurrent Neural Networks

    Authors: Ekaterina Lobacheva, Nadezhda Chirkova, Alexander Markovich, Dmitry Vetrov

    Abstract: Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove networks' structure units, e.g. neurons. We adjust the existing sparsification approaches to the gated recurrent architectures. Specifically, in addition to the sparsification of weights and neurons, we propose sparsifying the preactivations of gates. This makes some gates constant and simplifies… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: Published in Workshop on Context and Compositionality in Biological and Artificial Neural Systems, NeurIPS 2019

  24. arXiv:1812.05692  [pdf, other

    cs.LG cs.CL stat.ML

    Bayesian Sparsification of Gated Recurrent Neural Networks

    Authors: Ekaterina Lobacheva, Nadezhda Chirkova, Dmitry Vetrov

    Abstract: Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. It makes some g… ▽ More

    Submitted 12 December, 2018; originally announced December 2018.

    Comments: Published in Workshop on Compact Deep Neural Networks with industrial applications, NeurIPS 2018

  25. arXiv:1810.10927  [pdf, other

    cs.CL cs.LG stat.ML

    Bayesian Compression for Natural Language Processing

    Authors: Nadezhda Chirkova, Ekaterina Lobacheva, Dmitry Vetrov

    Abstract: In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters. The majority of these parameters are often concentrated in the embedding layer, which size grows proportionally to the vocabulary length. We propose a Bayesian sparsification technique for RNNs which allows compressing the RNN dozens or hundre… ▽ More

    Submitted 12 December, 2018; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: Published in EMNLP 2018

  26. arXiv:1708.00077  [pdf, other

    stat.ML cs.CL cs.LG

    Bayesian Sparsification of Recurrent Neural Networks

    Authors: Ekaterina Lobacheva, Nadezhda Chirkova, Dmitry Vetrov

    Abstract: Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse Variational Dropout eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural networks. To account for recurrent specifics we als… ▽ More

    Submitted 31 July, 2017; originally announced August 2017.

    Comments: Published in Workshop on Learning to Generate Natural Language, ICML, 2017