Search | arXiv e-print repository

Image Reconstruction as a Tool for Feature Analysis

Authors: Eduard Allakhverdov, Dmitrii Tarasov, Elizaveta Goncharova, Andrey Kuznetsov

Abstract: Vision encoders are increasingly used in modern applications, from vision-only models to multimodal systems such as vision-language models. Despite their remarkable success, it remains unclear how these architectures represent features internally. Here, we propose a novel approach for interpreting vision features via image reconstruction. We compare two related model families, SigLIP and SigLIP2,… ▽ More Vision encoders are increasingly used in modern applications, from vision-only models to multimodal systems such as vision-language models. Despite their remarkable success, it remains unclear how these architectures represent features internally. Here, we propose a novel approach for interpreting vision features via image reconstruction. We compare two related model families, SigLIP and SigLIP2, which differ only in their training objective, and show that encoders pre-trained on image-based tasks retain significantly more image information than those trained on non-image tasks such as contrastive learning. We further apply our method to a range of vision encoders, ranking them by the informativeness of their feature representations. Finally, we demonstrate that manipulating the feature space yields predictable changes in reconstructed images, revealing that orthogonal rotations (rather than spatial transformations) control color encoding. Our approach can be applied to any vision encoder, shedding light on the inner structure of its feature space. The code and model weights to reproduce the experiments are available in GitHub. △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: 23 pages, 14 figures

MSC Class: 68T10; 68T30; 68T45 ACM Class: I.2.10

arXiv:2505.22914 [pdf, ps, other]

cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

Authors: Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna Vorontsova, Anton Konushin, Vladislav Kurenkov, Danila Rukhovich

Abstract: Computer-Aided Design (CAD) plays a central role in engineering and manufacturing, making it possible to create precise and editable 3D models. Using a variety of sensor or user-provided data as inputs for CAD reconstruction can democratize access to design applications. However, existing methods typically focus on a single input modality, such as point clouds, images, or text, which limits their… ▽ More Computer-Aided Design (CAD) plays a central role in engineering and manufacturing, making it possible to create precise and editable 3D models. Using a variety of sensor or user-provided data as inputs for CAD reconstruction can democratize access to design applications. However, existing methods typically focus on a single input modality, such as point clouds, images, or text, which limits their generalizability and robustness. Leveraging recent advances in vision-language models (VLM), we propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities. Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically. Furthermore, we are the first to explore RL fine-tuning of LLMs for CAD tasks demonstrating that online RL algorithms such as Group Relative Preference Optimization (GRPO) outperform offline alternatives. In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously. More importantly, after RL fine-tuning, cadrille sets new state-of-the-art on three challenging datasets, including a real-world one. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2502.17666 [pdf, other]

Yes, Q-learning Helps Offline In-Context RL

Authors: Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Andrei Polubarov, Nikita Lyubaykin, Alexander Derevyagin, Igor Kiselev, Vladislav Kurenkov

Abstract: Existing offline in-context reinforcement learning (ICRL) methods have predominantly relied on supervised training objectives, which are known to have limitations in offline RL settings. In this study, we explore the integration of RL objectives within an offline ICRL framework. Through experiments on more than 150 GridWorld and MuJoCo environment-derived datasets, we demonstrate that optimizing R… ▽ More Existing offline in-context reinforcement learning (ICRL) methods have predominantly relied on supervised training objectives, which are known to have limitations in offline RL settings. In this study, we explore the integration of RL objectives within an offline ICRL framework. Through experiments on more than 150 GridWorld and MuJoCo environment-derived datasets, we demonstrate that optimizing RL objectives directly improves performance by approximately 30% on average compared to widely adopted Algorithm Distillation (AD), across various dataset coverages, structures, expertise levels, and environmental complexities. Furthermore, in the challenging XLand-MiniGrid environment, RL objectives doubled the performance of AD. Our results also reveal that the addition of conservatism during value learning brings additional improvements in almost all settings tested. Our findings emphasize the importance of aligning ICRL learning objectives with the RL reward-maximization goal, and demonstrate that offline RL is a promising direction for advancing ICRL. △ Less

Submitted 19 May, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.15381 [pdf, other]

MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing

Authors: Matvey Skripkin, Elizaveta Goncharova, Dmitrii Tarasov, Andrey Kuznetsov

Abstract: Multimodal language models (MLMs) integrate visual and textual information by coupling a vision encoder with a large language model through the specific adapter. While existing approaches commonly rely on a single pre-trained vision encoder, there is a great variability of specialized encoders that can boost model's performance in distinct domains. In this work, we propose MOVE (Mixture of Vision… ▽ More Multimodal language models (MLMs) integrate visual and textual information by coupling a vision encoder with a large language model through the specific adapter. While existing approaches commonly rely on a single pre-trained vision encoder, there is a great variability of specialized encoders that can boost model's performance in distinct domains. In this work, we propose MOVE (Mixture of Vision Encoders) a simple yet effective approach to leverage multiple pre-trained encoders for specialized multimodal tasks. MOVE automatically routes inputs to the most appropriate encoder among candidates such as Unichat, InternViT, and Texify, thereby enhancing performance across a diverse set of benchmarks, including ChartQA, MMBench, and MMMU. Experimental results demonstrate that MOVE achieves competitive accuracy without incurring the complexities of image slicing for high-resolution images. △ Less

Submitted 21 February, 2025; originally announced February 2025.

Comments: 10 pages, 6 figures, 4 tables

MSC Class: 6804; 68T50 (Primary) ACM Class: I.2.7; I.2.10; I.4.9

arXiv:2502.09680 [pdf, ps, other]

Object-Centric Latent Action Learning

Authors: Albina Klepach, Alexander Nikulin, Ilya Zisman, Denis Tarasov, Alexander Derevyagin, Andrei Polubarov, Nikita Lyubaykin, Vladislav Kurenkov

Abstract: Leveraging vast amounts of unlabeled internet video data for embodied AI is currently bottlenecked by the lack of action labels and the presence of action-correlated visual distractors. Although recent latent action policy optimization (LAPO) has shown promise in inferring proxy-action labels from visual observations, its performance degrades significantly when distractors are present. To address… ▽ More Leveraging vast amounts of unlabeled internet video data for embodied AI is currently bottlenecked by the lack of action labels and the presence of action-correlated visual distractors. Although recent latent action policy optimization (LAPO) has shown promise in inferring proxy-action labels from visual observations, its performance degrades significantly when distractors are present. To address this limitation, we propose a novel object-centric latent action learning framework that centers on objects rather than pixels. We leverage self-supervised object-centric pretraining to disentangle action-related and distracting dynamics. This allows LAPO to focus on task-relevant interactions, resulting in more robust proxy-action labels, enabling better imitation learning and efficient adaptation of the agent with just a few action-labeled trajectories. We evaluated our method in eight visually complex tasks across the Distracting Control Suite (DCS) and Distracting MetaWorld (DMW). Our results show that object-centric pretraining mitigates the negative effects of distractors by 50%, as measured by downstream task performance: average return (DCS) and success rate (DMW). △ Less

Submitted 12 June, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

Comments: Accepted by Workshop on World Models at ICLR 2025

arXiv:2502.00379 [pdf, ps, other]

Latent Action Learning Requires Supervision in the Presence of Distractors

Authors: Alexander Nikulin, Ilya Zisman, Denis Tarasov, Nikita Lyubaykin, Andrei Polubarov, Igor Kiselev, Vladislav Kurenkov

Abstract: Recently, latent action learning, pioneered by Latent Action Policies (LAPO), have shown remarkable pre-training efficiency on observation-only data, offering potential for leveraging vast amounts of video available on the web for embodied AI. However, prior work has focused on distractor-free data, where changes between observations are primarily explained by ground-truth actions. Unfortunately,… ▽ More Recently, latent action learning, pioneered by Latent Action Policies (LAPO), have shown remarkable pre-training efficiency on observation-only data, offering potential for leveraging vast amounts of video available on the web for embodied AI. However, prior work has focused on distractor-free data, where changes between observations are primarily explained by ground-truth actions. Unfortunately, real-world videos contain action-correlated distractors that may hinder latent action learning. Using Distracting Control Suite (DCS) we empirically investigate the effect of distractors on latent action learning and demonstrate that LAPO struggle in such scenario. We propose LAOM, a simple LAPO modification that improves the quality of latent actions by 8x, as measured by linear probing. Importantly, we show that providing supervision with ground-truth actions, as few as 2.5% of the full dataset, during latent action learning improves downstream performance by 4.2x on average. Our findings suggest that integrating supervision during Latent Action Models (LAM) training is critical in the presence of distractors, challenging the conventional pipeline of first learning LAM and only then decoding from latent to ground-truth actions. △ Less

Submitted 12 June, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

Comments: ICML 2025, Poster, Project Page: https://laom.dunnolab.ai/, Source code: https://github.com/dunnolab/laom

arXiv:2501.19400 [pdf, other]

Vintix: Action Model via In-Context Reinforcement Learning

Authors: Andrey Polubarov, Nikita Lyubaykin, Alexander Derevyagin, Ilya Zisman, Denis Tarasov, Alexander Nikulin, Vladislav Kurenkov

Abstract: In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we presen… ▽ More In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems. Code to be released at https://github.com/dunnolab/vintix △ Less

Submitted 31 January, 2025; originally announced January 2025.

Comments: Preprint. In review

arXiv:2411.01958 [pdf, other]

N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs

Authors: Ilya Zisman, Alexander Nikulin, Viacheslav Sinii, Denis Tarasov, Nikita Lyubaykin, Andrei Polubarov, Igor Kiselev, Vladislav Kurenkov

Abstract: In-context learning allows models like transformers to adapt to new tasks from a few examples without updating their weights, a desirable trait for reinforcement learning (RL). However, existing in-context RL methods, such as Algorithm Distillation (AD), demand large, carefully curated datasets and can be unstable and costly to train due to the transient nature of in-context learning abilities. In… ▽ More In-context learning allows models like transformers to adapt to new tasks from a few examples without updating their weights, a desirable trait for reinforcement learning (RL). However, existing in-context RL methods, such as Algorithm Distillation (AD), demand large, carefully curated datasets and can be unstable and costly to train due to the transient nature of in-context learning abilities. In this work, we integrated the n-gram induction heads into transformers for in-context RL. By incorporating these n-gram attention patterns, we considerably reduced the amount of data required for generalization and eased the training process by making models less sensitive to hyperparameters. Our approach matches, and in some cases surpasses, the performance of AD in both grid-world and pixel-based environments, suggesting that n-gram induction heads could improve the efficiency of in-context RL. △ Less

Submitted 6 February, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

arXiv:2409.07606 [pdf, other]

The Role of Deep Learning Regularizations on Actors in Offline RL

Authors: Denis Tarasov, Anja Surina, Caglar Gulcehre

Abstract: Deep learning regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities. However, in the domain of Reinforcement Learning (RL), the application of these techniques has been limited, usually applied to value f… ▽ More Deep learning regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities. However, in the domain of Reinforcement Learning (RL), the application of these techniques has been limited, usually applied to value function estimators (Hiraoka et al., 2021; Smith et al., 2022), and may result in detrimental effects. This issue is even more pronounced in offline RL settings, which bear greater similarity to supervised learning but have received less attention. Recent work in continuous offline RL (Park et al., 2024) has demonstrated that while we can build sufficiently powerful critic networks, the generalization of actor networks remains a bottleneck. In this study, we empirically show that applying standard regularization techniques to actor networks in offline RL actor-critic algorithms yields improvements of 6% on average across two algorithms and three different continuous D4RL domains. △ Less

Submitted 21 November, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

Comments: https://github.com/DT6A/ActoReg

arXiv:2406.06309 [pdf, other]

Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?

Authors: Denis Tarasov, Kirill Brilliantov, Dmitrii Kharlapenko

Abstract: In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing… ▽ More In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing study have not extensively benchmarked the effects of this replacement across various domains, as the primary objective was to demonstrate the efficacy of the concept across a broad spectrum of tasks, without delving into in-depth analysis. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup and analyze the effects of different aspects on performance. Through large-scale experiments conducted across a diverse range of tasks using different algorithms, we aim to gain deeper insights into the implications of this approach. Our results reveal that incorporating this change can lead to superior performance over state-of-the-art solutions for some algorithms in certain tasks, while maintaining comparable performance levels in other tasks, however for other algorithms this modification might lead to the dramatic performance drop. This findings are crucial for further application of classification approach in research and practical tasks. △ Less

Submitted 16 November, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: https://github.com/DT6A/ClORL

arXiv:2402.01812 [pdf, other]

Distilling LLMs' Decomposition Abilities into Compact Language Models

Authors: Denis Tarasov, Kumar Shridhar

Abstract: Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforceme… ▽ More Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM`s capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. The development of an AI-generated dataset and the establishment of baselines constitute the primary contributions of our work, underscoring the potential of compact models in replicating complex problem-solving skills. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: https://github.com/DT6A/GSM8K-AI-SubQ

arXiv:2306.08772 [pdf, other]

Katakomba: Tools and Benchmarks for Data-Driven NetHack

Authors: Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov, Sergey Kolesnikov

Abstract: NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack datase… ▽ More NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: resource-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud. △ Less

Submitted 26 October, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Source code at https://github.com/corl-team/katakomba

arXiv:2305.09836 [pdf, other]

Revisiting the Minimalist Approach to Offline Reinforcement Learning

Authors: Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov

Abstract: Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices o… ▽ More Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods in both offline and offline-to-online settings. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments. △ Less

Submitted 24 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: Source code: https://github.com/DT6A/ReBRAC

arXiv:2301.13616 [pdf, other]

Anti-Exploration by Random Network Distillation

Authors: Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov

Abstract: Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively min… ▽ More Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin. △ Less

Submitted 17 May, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: ICML 2023, Poster, Source code: https://github.com/tinkoff-ai/sac-rnd

arXiv:2211.11096 [pdf, other]

Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows

Authors: Dmitriy Akimov, Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov, Sergey Kolesnikov

Abstract: Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these p… ▽ More Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is trained via reinforcement learning. This approach avoids querying actions outside of the training dataset and therefore does not require additional regularization for out-of-dataset actions. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms with generative action models on a large portion of datasets. △ Less

Submitted 30 January, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2022. Source code: https://github.com/tinkoff-ai/cnf

arXiv:2211.11092 [pdf, other]

Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size

Authors: Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Dmitry Akimov, Sergey Kolesnikov

Abstract: Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for… ▽ More Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 3-4x times on average. △ Less

Submitted 30 January, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted at 3rd Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2022. Source code: https://github.com/tinkoff-ai/lb-sac

arXiv:2210.07105 [pdf, other]

CORL: Research-oriented Deep Offline Reinforcement Learning Library

Authors: Denis Tarasov, Alexander Nikulin, Dmitry Akimov, Vladislav Kurenkov, Sergey Kolesnikov

Abstract: CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easie… ▽ More CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance. △ Less

Submitted 26 October, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Source code at https://github.com/corl-team/CORL

arXiv:1912.04619 [pdf]

Inception Architecture and Residual Connections in Classification of Breast Cancer Histology Images

Authors: Mohammad Ibrahim Sarker, Hyongsuk Kim, Denis Tarasov, Dinar Akhmetzanov

Abstract: This paper presents results of applying Inception v4 deep convolutional neural network to ICIAR-2018 Breast Cancer Classification Grand Challenge, part a. The Challenge task is to classify breast cancer biopsy results, presented in form of hematoxylin and eosin stained images. Breast cancer classification is of primary interest to the medical practitioners and thus binary classification of breast… ▽ More This paper presents results of applying Inception v4 deep convolutional neural network to ICIAR-2018 Breast Cancer Classification Grand Challenge, part a. The Challenge task is to classify breast cancer biopsy results, presented in form of hematoxylin and eosin stained images. Breast cancer classification is of primary interest to the medical practitioners and thus binary classification of breast cancer images have been under investigation by many researchers, but multi-class categorization of histology breast images have been challenging due to the subtle differences among the categories. In this work extensive data augmentation is conducted to reduce overfitting and effectiveness of committee of several Inception v4 networks is studied. We report 89% accuracy on 4 class classification task and 93.7% on carcinoma/non-carcinoma two class classification task using our test set of 80 images. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: Achieved 23rd place out if 50 accepted positions (ICIAR Grand Challenge on Brest cancer histology images)

arXiv:1504.03876 [pdf, ps, other]

Light exotic nuclei with extreme neutron excess and $2 \leq z \leq 8$

Authors: V. N. Tarasov, K. A. Gridnev, V. I. Kuprikov, D. K. Gridnev, D. V. Tarasov, K. S. Godbey, X. ViÑas, Walter Greiner

Abstract: Using HF+BCS method we study light nuclei with nuclear charge in the range $2 \leq Z \leq 8$ and lying near the neutron drip line. The HF method uses effective Skyrme forces and allows for axial deformations. We find that the neutron drip line forms stability peninsulas at $^{18}$He and $^{40}$C. These isotopes are found to be stable against one neutron emission and possess the highest known neutr… ▽ More Using HF+BCS method we study light nuclei with nuclear charge in the range $2 \leq Z \leq 8$ and lying near the neutron drip line. The HF method uses effective Skyrme forces and allows for axial deformations. We find that the neutron drip line forms stability peninsulas at $^{18}$He and $^{40}$C. These isotopes are found to be stable against one neutron emission and possess the highest known neutron to proton ratio in stable nuclei. △ Less

Submitted 15 April, 2015; originally announced April 2015.

arXiv:1310.1024 [pdf, ps, other]

A Simple Method for Generating Electromagnetic Oscillations

Authors: Vyacheslav Buts, Dmitriy Vavriv, Oleg Nechayev, Dmitriy Tarasov

Abstract: We propose a novel approach to the generation of electromagnetic oscillations by means of a low-frequency pumping of two coupled linear oscillators. A theory of such generation mechanism is proposed, and its feasibility is demonstrated by using coupled RLC oscillators. A comparison of the theoretical results and the experimental data is presented. We propose a novel approach to the generation of electromagnetic oscillations by means of a low-frequency pumping of two coupled linear oscillators. A theory of such generation mechanism is proposed, and its feasibility is demonstrated by using coupled RLC oscillators. A comparison of the theoretical results and the experimental data is presented. △ Less

Submitted 23 August, 2013; originally announced October 2013.

Comments: 5 pages, 7 figures

arXiv:1210.6788 [pdf]

Peculiarity of chaotic and regular dynamics of waves

Authors: Vyacheslav Buts, Igor Kovalchuk, Dmytro Tarasov, Alexander Tolstoluzhsky

Abstract: It is shown, that at weakly nonlinear interaction of waves are possible as modes with chaotic dynamics, and with increasing degree of coherence. Conditions are found at which they arise. One of the types of such interaction is decays. The important features of such processes in plasma are modes with cascades. They arise in that case when the high-frequency wave which has appeared as a result of de… ▽ More It is shown, that at weakly nonlinear interaction of waves are possible as modes with chaotic dynamics, and with increasing degree of coherence. Conditions are found at which they arise. One of the types of such interaction is decays. The important features of such processes in plasma are modes with cascades. They arise in that case when the high-frequency wave which has appeared as a result of decay can take part in new decay. Chaotic dynamics of decays can be used for formation of spectra of excited oscillations. For realization of such possibility the dispersive properties of a cylindrical wave guide partially filled by magnetoactive plasma are investigated. It is shown, that such electrodynamic structure is convenient for realization as regular and chaotic modes. △ Less

Submitted 25 October, 2012; originally announced October 2012.

arXiv:1107.1055 [pdf, ps, other]

doi 10.1142/S0218301312500474

The Quest for the Heaviest Uranium Isotope

Authors: S. Schramm, D. Gridnev, D. V. Tarasov, V. N. Tarasov, W. Greiner

Abstract: We study Uranium isotopes and surrounding elements at very large neutron number excess. Relativistic mean field and Skyrme-type approaches with different parametrizations are used in the study. Most models show clear indications for isotopes that are stable with respect to neutron emission far beyond N=184 up to the range of around N=258. We study Uranium isotopes and surrounding elements at very large neutron number excess. Relativistic mean field and Skyrme-type approaches with different parametrizations are used in the study. Most models show clear indications for isotopes that are stable with respect to neutron emission far beyond N=184 up to the range of around N=258. △ Less

Submitted 17 January, 2012; v1 submitted 6 July, 2011; originally announced July 2011.

Comments: 4 pages, 5. figures

arXiv:1106.5910 [pdf, ps, other]

Stability Peninsulas on the Neutron Drip Line

Authors: V. N. Tarasov, K. A. Gridnev, D. K. Gridnev, D. V. Tarasov, S. Schramm, X. Viñas, Walter Greiner

Abstract: Using HF+BCS method with Skyrme forces we analyze the neutron drip line. It is shown that around magic and new magic numbers the drip line may form stability peninsulas. It is shown the location of these peninsulas does not depend on the choice of Skyrme forces. It is found that the size of the peninsulas is sensitive to the choice of Skyrme forces and the most extended peninsulas appear with the… ▽ More Using HF+BCS method with Skyrme forces we analyze the neutron drip line. It is shown that around magic and new magic numbers the drip line may form stability peninsulas. It is shown the location of these peninsulas does not depend on the choice of Skyrme forces. It is found that the size of the peninsulas is sensitive to the choice of Skyrme forces and the most extended peninsulas appear with the SkI2 set. △ Less

Submitted 28 November, 2011; v1 submitted 29 June, 2011; originally announced June 2011.

arXiv:nucl-th/0411096 [pdf, ps, other]

doi 10.1142/S0218301306004053

On stability of the neutron rich Oxygen isotopes

Authors: K. A. Gridnev, D. K. Gridnev, V. G. Kartavenko, V. E. Mitroshin, V. N. Tarasov, D. V. Tarasov, W. Greiner

Abstract: Stability with respect to neutron emission is studied for highly neutron-excessive Oxygen isotopes in the framework of Hartree-Fock-Bogoliubov approach with Skyrme forces Sly4 and Ska. Our calculations show increase of stability around 40O. Stability with respect to neutron emission is studied for highly neutron-excessive Oxygen isotopes in the framework of Hartree-Fock-Bogoliubov approach with Skyrme forces Sly4 and Ska. Our calculations show increase of stability around 40O. △ Less

Submitted 24 November, 2004; originally announced November 2004.

Comments: 5 pages, 3 figures

Journal ref: Int.J.Mod.Phys. E15 (2006) 673-684

Showing 1–24 of 24 results for author: Tarasov, D