-
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding
Authors:
Alexey Skrynnik,
Anton Andreychuk,
Anatolii Borzilov,
Alexander Chernyavskiy,
Konstantin Yakovlev,
Aleksandr Panov
Abstract:
Multi-agent reinforcement learning (MARL) has recently excelled in solving challenging cooperative and competitive multi-agent problems in various environments, typically involving a small number of agents and full observability. Moreover, a range of crucial robotics-related tasks, such as multi-robot pathfinding, which have traditionally been approached with classical non-learnable methods (e.g.,…
▽ More
Multi-agent reinforcement learning (MARL) has recently excelled in solving challenging cooperative and competitive multi-agent problems in various environments, typically involving a small number of agents and full observability. Moreover, a range of crucial robotics-related tasks, such as multi-robot pathfinding, which have traditionally been approached with classical non-learnable methods (e.g., heuristic search), are now being suggested for solution using learning-based or hybrid methods. However, in this domain, it remains difficult, if not impossible, to conduct a fair comparison between classical, learning-based, and hybrid approaches due to the lack of a unified framework that supports both learning and evaluation. To address this, we introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, a collection of predefined problem instances, a visualization toolkit, and a benchmarking tool for automated evaluation. We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators (such as success rate and path length), enabling a fair multi-fold comparison. The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
△ Less
Submitted 8 April, 2025; v1 submitted 20 July, 2024;
originally announced July 2024.
-
Self-supervised Physics-based Denoising for Computed Tomography
Authors:
Elvira Zainulina,
Alexey Chernyavskiy,
Dmitry V. Dylov
Abstract:
Computed Tomography (CT) imposes risk on the patients due to its inherent X-ray radiation, stimulating the development of low-dose CT (LDCT) imaging methods. Lowering the radiation dose reduces the health risks but leads to noisier measurements, which decreases the tissue contrast and causes artifacts in CT images. Ultimately, these issues could affect the perception of medical personnel and could…
▽ More
Computed Tomography (CT) imposes risk on the patients due to its inherent X-ray radiation, stimulating the development of low-dose CT (LDCT) imaging methods. Lowering the radiation dose reduces the health risks but leads to noisier measurements, which decreases the tissue contrast and causes artifacts in CT images. Ultimately, these issues could affect the perception of medical personnel and could cause misdiagnosis. Modern deep learning noise suppression methods alleviate the challenge but require low-noise-high-noise CT image pairs for training, rarely collected in regular clinical workflows. In this work, we introduce a new self-supervised approach for CT denoising Noise2NoiseTD-ANM that can be trained without the high-dose CT projection ground truth images. Unlike previously proposed self-supervised techniques, the introduced method exploits the connections between the adjacent projections and the actual model of CT noise distribution. Such a combination allows for interpretable no-reference denoising using nothing but the original noisy LDCT projections. Our experiments with LDCT data demonstrate that the proposed method reaches the level of the fully supervised models, sometimes superseding them, easily generalizes to various noise levels, and outperforms state-of-the-art self-supervised denoising algorithms.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media
Authors:
Momchil Hardalov,
Anton Chernyavskiy,
Ivan Koychev,
Dmitry Ilvovsky,
Preslav Nakov
Abstract:
While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible…
▽ More
While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible approach as people trust manual fact-checking, and as many claims are repeated multiple times. Yet, a major issue when building such systems is the small number of known tweet--verifying article pairs available for training. Here, we aim to bridge this gap by making use of crowd fact-checking, i.e., mining claims in social media for which users have responded with a link to a fact-checking article. In particular, we mine a large-scale collection of 330,000 tweets paired with a corresponding fact-checking article. We further propose an end-to-end framework to learn from this noisy data based on modified self-adaptive training, in a distant supervision scenario. Our experiments on the CLEF'21 CheckThat! test set show improvements over the state of the art by two points absolute. Our code and datasets are available at https://github.com/mhardalov/crowdchecked-claims
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Batch-Softmax Contrastive Loss for Pairwise Sentence Scoring Tasks
Authors:
Anton Chernyavskiy,
Dmitry Ilvovsky,
Pavel Kalinin,
Preslav Nakov
Abstract:
The use of contrastive loss for representation learning has become prominent in computer vision, and it is now getting attention in Natural Language Processing (NLP). Here, we explore the idea of using a batch-softmax contrastive loss when fine-tuning large-scale pre-trained transformer models to learn better task-specific sentence embeddings for pairwise sentence scoring tasks. We introduce and s…
▽ More
The use of contrastive loss for representation learning has become prominent in computer vision, and it is now getting attention in Natural Language Processing (NLP). Here, we explore the idea of using a batch-softmax contrastive loss when fine-tuning large-scale pre-trained transformer models to learn better task-specific sentence embeddings for pairwise sentence scoring tasks. We introduce and study a number of variations in the calculation of the loss as well as in the overall training procedure; in particular, we find that data shuffling can be quite important. Our experimental results show sizable improvements on a number of datasets and pairwise sentence scoring tasks including classification, ranking, and regression. Finally, we offer detailed analysis and discussion, which should be useful for researchers aiming to explore the utility of contrastive loss in NLP.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
Medical image segmentation with imperfect 3D bounding boxes
Authors:
Ekaterina Redekop,
Alexey Chernyavskiy
Abstract:
The development of high quality medical image segmentation algorithms depends on the availability of large datasets with pixel-level labels. The challenges of collecting such datasets, especially in case of 3D volumes, motivate to develop approaches that can learn from other types of labels that are cheap to obtain, e.g. bounding boxes. We focus on 3D medical images with their corresponding 3D bou…
▽ More
The development of high quality medical image segmentation algorithms depends on the availability of large datasets with pixel-level labels. The challenges of collecting such datasets, especially in case of 3D volumes, motivate to develop approaches that can learn from other types of labels that are cheap to obtain, e.g. bounding boxes. We focus on 3D medical images with their corresponding 3D bounding boxes which are considered as series of per-slice non-tight 2D bounding boxes. While current weakly-supervised approaches that use 2D bounding boxes as weak labels can be applied to medical image segmentation, we show that their success is limited in cases when the assumption about the tightness of the bounding boxes breaks. We propose a new bounding box correction framework which is trained on a small set of pixel-level annotations to improve the tightness of a larger set of non-tight bounding box annotations. The effectiveness of our solution is demonstrated by evaluating a known weakly-supervised segmentation approach with and without the proposed bounding box correction algorithm. When the tightness is improved by our solution, the results of the weakly-supervised segmentation become much closer to those of the fully-supervised one.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation
Authors:
Ivan Zakazov,
Boris Shirokikh,
Alexey Chernyavskiy,
Mikhail Belyaev
Abstract:
Domain Adaptation (DA) methods are widely used in medical image segmentation tasks to tackle the problem of differently distributed train (source) and test (target) data. We consider the supervised DA task with a limited number of annotated samples from the target domain. It corresponds to one of the most relevant clinical setups: building a sufficiently accurate model on the minimum possible amou…
▽ More
Domain Adaptation (DA) methods are widely used in medical image segmentation tasks to tackle the problem of differently distributed train (source) and test (target) data. We consider the supervised DA task with a limited number of annotated samples from the target domain. It corresponds to one of the most relevant clinical setups: building a sufficiently accurate model on the minimum possible amount of annotated data. Existing methods mostly fine-tune specific layers of the pretrained Convolutional Neural Network (CNN). However, there is no consensus on which layers are better to fine-tune, e.g. the first layers for images with low-level domain shift or the deeper layers for images with high-level domain shift. To this end, we propose SpotTUnet - a CNN architecture that automatically chooses the layers which should be optimally fine-tuned. More specifically, on the target domain, our method additionally learns the policy that indicates whether a specific layer should be fine-tuned or reused from the pretrained network. We show that our method performs at the same level as the best of the nonflexible fine-tuning methods even under the extreme scarcity of annotated data. Secondly, we show that SpotTUnet policy provides a layer-wise visualization of the domain shift impact on the network, which could be further used to develop robust domain generalization methods. In order to extensively evaluate SpotTUnet performance, we use a publicly available dataset of brain MR images (CC359), characterized by explicit domain shift. We release a reproducible experimental pipeline.
△ Less
Submitted 10 July, 2021;
originally announced July 2021.
-
WhatTheWikiFact: Fact-Checking Claims Against Wikipedia
Authors:
Anton Chernyavskiy,
Dmitry Ilvovsky,
Preslav Nakov
Abstract:
The rise of Internet has made it a major source of information. Unfortunately, not all information online is true, and thus a number of fact-checking initiatives have been launched, both manual and automatic, to deal with the problem. Here, we present our contribution in this regard: \emph{WhatTheWikiFact}, a system for automatic claim verification using Wikipedia. The system can predict the verac…
▽ More
The rise of Internet has made it a major source of information. Unfortunately, not all information online is true, and thus a number of fact-checking initiatives have been launched, both manual and automatic, to deal with the problem. Here, we present our contribution in this regard: \emph{WhatTheWikiFact}, a system for automatic claim verification using Wikipedia. The system can predict the veracity of an input claim, and it further shows the evidence it has retrieved as part of the verification process. It shows confidence scores and a list of relevant Wikipedia articles, together with detailed information about each article, including the phrase used to retrieve it, the most relevant sentences extracted from it and their stance with respect to the input claim, as well as the associated probabilities. The system supports several languages: Bulgarian, English, and Russian.
△ Less
Submitted 10 October, 2021; v1 submitted 16 April, 2021;
originally announced May 2021.
-
Transformers: "The End of History" for NLP?
Authors:
Anton Chernyavskiy,
Dmitry Ilvovsky,
Preslav Nakov
Abstract:
Recent advances in neural architectures, such as the Transformer, coupled with the emergence of large-scale pre-trained models such as BERT, have revolutionized the field of Natural Language Processing (NLP), pushing the state of the art for a number of NLP tasks. A rich family of variations of these models has been proposed, such as RoBERTa, ALBERT, and XLNet, but fundamentally, they all remain l…
▽ More
Recent advances in neural architectures, such as the Transformer, coupled with the emergence of large-scale pre-trained models such as BERT, have revolutionized the field of Natural Language Processing (NLP), pushing the state of the art for a number of NLP tasks. A rich family of variations of these models has been proposed, such as RoBERTa, ALBERT, and XLNet, but fundamentally, they all remain limited in their ability to model certain kinds of information, and they cannot cope with certain information sources, which was easy for pre-existing models. Thus, here we aim to shed light on some important theoretical limitations of pre-trained BERT-style models that are inherent in the general Transformer architecture. First, we demonstrate in practice on two general types of tasks -- segmentation and segment labeling -- and on four datasets that these limitations are indeed harmful and that addressing them, even in some very simple and naive ways, can yield sizable improvements over vanilla RoBERTa and XLNet models. Then, we offer a more general discussion on desiderata for future additions to the Transformer architecture that would increase its expressiveness, which we hope could help in the design of the next generation of deep NLP architectures.
△ Less
Submitted 23 September, 2021; v1 submitted 9 April, 2021;
originally announced May 2021.
-
Uncertainty-based method for improving poorly labeled segmentation datasets
Authors:
Ekaterina Redekop,
Alexey Chernyavskiy
Abstract:
The success of modern deep learning algorithms for image segmentation heavily depends on the availability of large datasets with clean pixel-level annotations (masks), where the objects of interest are accurately delineated. Lack of time and expertise during data annotation leads to incorrect boundaries and label noise. It is known that deep convolutional neural networks (DCNNs) can memorize even…
▽ More
The success of modern deep learning algorithms for image segmentation heavily depends on the availability of large datasets with clean pixel-level annotations (masks), where the objects of interest are accurately delineated. Lack of time and expertise during data annotation leads to incorrect boundaries and label noise. It is known that deep convolutional neural networks (DCNNs) can memorize even completely random labels, resulting in poor accuracy. We propose a framework to train binary segmentation DCNNs using sets of unreliable pixel-level annotations. Erroneously labeled pixels are identified based on the estimated aleatoric uncertainty of the segmentation and are relabeled to the true value.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
No-reference denoising of low-dose CT projections
Authors:
Elvira Zainulina,
Alexey Chernyavskiy,
Dmitry V. Dylov
Abstract:
Low-dose computed tomography (LDCT) became a clear trend in radiology with an aspiration to refrain from delivering excessive X-ray radiation to the patients. The reduction of the radiation dose decreases the risks to the patients but raises the noise level, affecting the quality of the images and their ultimate diagnostic value. One mitigation option is to consider pairs of low-dose and high-dose…
▽ More
Low-dose computed tomography (LDCT) became a clear trend in radiology with an aspiration to refrain from delivering excessive X-ray radiation to the patients. The reduction of the radiation dose decreases the risks to the patients but raises the noise level, affecting the quality of the images and their ultimate diagnostic value. One mitigation option is to consider pairs of low-dose and high-dose CT projections to train a denoising model using deep learning algorithms; however, such pairs are rarely available in practice. In this paper, we present a new self-supervised method for CT denoising. Unlike existing self-supervised approaches, the proposed method requires only noisy CT projections and exploits the connections between adjacent images. The experiments carried out on an LDCT dataset demonstrate that our method is almost as accurate as the supervised approach, while also outperforming the considered self-supervised denoising methods.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
First U-Net Layers Contain More Domain Specific Information Than The Last Ones
Authors:
Boris Shirokikh,
Ivan Zakazov,
Alexey Chernyavskiy,
Irina Fedulova,
Mikhail Belyaev
Abstract:
MRI scans appearance significantly depends on scanning protocols and, consequently, the data-collection institution. These variations between clinical sites result in dramatic drops of CNN segmentation quality on unseen domains. Many of the recently proposed MRI domain adaptation methods operate with the last CNN layers to suppress domain shift. At the same time, the core manifestation of MRI vari…
▽ More
MRI scans appearance significantly depends on scanning protocols and, consequently, the data-collection institution. These variations between clinical sites result in dramatic drops of CNN segmentation quality on unseen domains. Many of the recently proposed MRI domain adaptation methods operate with the last CNN layers to suppress domain shift. At the same time, the core manifestation of MRI variability is a considerable diversity of image intensities. We hypothesize that these differences can be eliminated by modifying the first layers rather than the last ones. To validate this simple idea, we conducted a set of experiments with brain MRI scans from six domains. Our results demonstrate that 1) domain-shift may deteriorate the quality even for a simple brain extraction segmentation task (surface Dice Score drops from 0.85-0.89 even to 0.09); 2) fine-tuning of the first layers significantly outperforms fine-tuning of the last layers in almost all supervised domain adaptation setups. Moreover, fine-tuning of the first layers is a better strategy than fine-tuning of the whole network, if the amount of annotated data from the new domain is strictly limited.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF, and Transfer Learning
Authors:
Anton Chernyavskiy,
Dmitry Ilvovsky,
Preslav Nakov
Abstract:
We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task, the consistency between nested spans, repetitions, and labels from similar spans in…
▽ More
We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task, the consistency between nested spans, repetitions, and labels from similar spans in training. We achieved sizable improvements over baseline fine-tuned RoBERTa models, and the official evaluation ranked our system 3rd (almost tied with the 2nd) out of 36 teams on the span identification subtask with an F1 score of 0.491, and 2nd (almost tied with the 1st) out of 31 teams on the technique classification subtask with an F1 score of 0.62.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.