Skip to main content

Showing 1–50 of 56 results for author: Panov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05135  [pdf, ps, other

    cs.RO

    LERa: Replanning with Visual Feedback in Instruction Following

    Authors: Svyatoslav Pchelintsev, Maxim Patratskiy, Anatoly Onishchenko, Alexandr Korchemnyi, Aleksandr Medvedev, Uliana Vinogradova, Ilya Galuzinsky, Aleksey Postnikov, Alexey K. Kovalev, Aleksandr I. Panov

    Abstract: Large Language Models are increasingly used in robotics for task planning, but their reliance on textual inputs limits their adaptability to real-world changes and failures. To address these challenges, we propose LERa - Look, Explain, Replan - a Visual Language Model-based replanning approach that utilizes visual feedback. Unlike existing methods, LERa requires only a raw RGB image, a natural lan… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: IROS 2025

  2. arXiv:2507.05118  [pdf, ps, other

    cs.RO cs.AI

    VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

    Authors: Danil S. Grigorev, Alexey K. Kovalev, Aleksandr I. Panov

    Abstract: In the field of robotics, researchers face a critical challenge in ensuring reliable and efficient task planning. Verifying high-level task plans before execution significantly reduces errors and enhance the overall performance of these systems. In this paper, we propose an architecture for automatically verifying high-level task plans before their execution in simulator or real-world environments… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: IROS 2025

  3. arXiv:2506.23793  [pdf, ps, other

    cs.AI cs.LG cs.MA

    Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning

    Authors: Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik

    Abstract: Multi-agent pathfinding (MAPF) is a common abstraction of multi-robot trajectory planning problems, where multiple homogeneous robots simultaneously move in the shared environment. While solving MAPF optimally has been proven to be NP-hard, scalable, and efficient, solvers are vital for real-world applications like logistics, search-and-rescue, etc. To this end, decentralized suboptimal MAPF solve… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  4. arXiv:2506.21782  [pdf, ps, other

    cs.LG cs.RO

    M3PO: Massively Multi-Task Model-Based Policy Optimization

    Authors: Aditya Narendra, Dmitry Makarov, Aleksandr Panov

    Abstract: We introduce Massively Multi-Task Model-Based Policy Optimization (M3PO), a scalable model-based reinforcement learning (MBRL) framework designed to address sample inefficiency in single-task settings and poor generalization in multi-task domains. Existing model-based approaches like DreamerV3 rely on pixel-level generative models that neglect control-centric representations, while model-free meth… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 6 pages, 4 figures. Accepted at IEEE/RSJ IROS 2025. Full version, including appendix and implementation details

  5. arXiv:2506.04828  [pdf, ps, other

    cs.AI

    Safe Planning and Policy Optimization via World Model Learning

    Authors: Artem Latyshev, Gregory Gorbov, Aleksandr I. Panov

    Abstract: Reinforcement Learning (RL) applications in real-world scenarios must prioritize safety and reliability, which impose strict constraints on agent behavior. Model-based RL leverages predictive world models for action planning and policy optimization, but inherent model inaccuracies can lead to catastrophic failures in safety-critical settings. We propose a novel model-based RL framework that jointl… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  6. arXiv:2506.04505  [pdf, ps, other

    cs.RO cs.LG

    SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement Learning

    Authors: Nikita Oskolkov, Huzhenyu Zhang, Dmitry Makarov, Dmitry Yudin, Aleksandr Panov

    Abstract: The 3D scene graph models spatial relationships between objects, enabling the agent to efficiently navigate in a partially observable environment and predict the location of the target object.This paper proposes an original framework named SGN-CIRL (3D Scene Graph-Based Reinforcement Learning Navigation) for mapless reinforcement learning-based robot navigation with learnable representation of ope… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: 7 pages, 11 figures

  7. arXiv:2506.04089  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.RO

    AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

    Authors: Anastasiia Ivanova, Eva Bakaeva, Zoya Volovikova, Alexey K. Kovalev, Aleksandr I. Panov

    Abstract: As a part of an embodied agent, Large Language Models (LLMs) are typically used for behavior planning given natural language instructions from the user. However, dealing with ambiguous instructions in real-world environments remains a challenge for LLMs. Various methods for task ambiguity detection have been proposed. However, it is difficult to compare them because they are tested on different da… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: ACL 2025 (Main Conference)

  8. arXiv:2505.11962  [pdf, ps, other

    cs.AI

    CrafText Benchmark: Advancing Instruction Following in Complex Multimodal Open-Ended World

    Authors: Zoya Volovikova, Gregory Gorbov, Petr Kuderov, Aleksandr I. Panov, Alexey Skrynnik

    Abstract: Following instructions in real-world conditions requires the ability to adapt to the world's volatility and entanglement: the environment is dynamic and unpredictable, instructions can be linguistically complex with diverse vocabulary, and the number of possible goals an agent may encounter is vast. Despite extensive research in this area, most studies are conducted in static environments with sim… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  9. arXiv:2504.07708  [pdf, other

    cs.RO

    TOCALib: Optimal control library with interpolation for bimanual manipulation and obstacles avoidance

    Authors: Yulia Danik, Dmitry Makarov, Aleksandra Arkhipova, Sergei Davidenko, Aleksandr Panov

    Abstract: The paper presents a new approach for constructing a library of optimal trajectories for two robotic manipulators, Two-Arm Optimal Control and Avoidance Library (TOCALib). The optimisation takes into account kinodynamic and other constraints within the FROST framework. The novelty of the method lies in the consideration of collisions using the DCOL method, which allows obtaining symbolic expressio… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 10 pages, 14 figures, 3 tables, 2 algorithms, 1 appendix

  10. arXiv:2502.10550  [pdf, other

    cs.LG cs.AI cs.RO

    Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

    Authors: Egor Cherepanov, Nikita Kachaev, Alexey K. Kovalev, Aleksandr I. Panov

    Abstract: Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent's memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial… ▽ More

    Submitted 10 June, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: 42 pages, 2 figures

  11. arXiv:2412.19847  [pdf, other

    cs.CV cs.AI cs.LG

    Symbolic Disentangled Representations for Images

    Authors: Alexandr Korchemnyi, Alexey K. Kovalev, Aleksandr I. Panov

    Abstract: The idea of disentangled representations is to reduce the data to a set of generative factors that produce it. Typically, such representations are vectors in latent space, where each coordinate corresponds to one of the generative factors. The object can then be modified by changing the value of a particular coordinate, but it is necessary to determine which coordinate corresponds to the desired g… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

    Comments: 14 pages, 14 figures

  12. arXiv:2412.06531  [pdf, other

    cs.LG cs.AI

    Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

    Authors: Egor Cherepanov, Nikita Kachaev, Artem Zholus, Alexey K. Kovalev, Aleksandr I. Panov

    Abstract: The incorporation of memory into agents is essential for numerous tasks within the domain of Reinforcement Learning (RL). In particular, memory is paramount for tasks that require the utilization of past information, adaptation to novel environments, and improved sample efficiency. However, the term ``memory'' encompasses a wide range of concepts, which, coupled with the lack of a unified methodol… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: 18 pages, 6 figures

  13. arXiv:2410.06819  [pdf, other

    cs.RO cs.AI

    Dynamic Neural Potential Field: Online Trajectory Optimization in Presence of Moving Obstacles

    Authors: Aleksey Staroverov, Muhammad Alhaddad, Aditya Narendra, Konstantin Mironov, Aleksandr Panov

    Abstract: We address a task of local trajectory planning for the mobile robot in the presence of static and dynamic obstacles. Local trajectory is obtained as a numerical solution of the Model Predictive Control (MPC) problem. Collision avoidance may be provided by adding repulsive potential of the obstacles to the cost function of MPC. We develop an approach, where repulsive potential is estimated by the n… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  14. arXiv:2409.10165  [pdf, other

    cs.RO

    Maneuver Decision-Making with Trajectory Streams Prediction for Autonomous Vehicles

    Authors: Mais Jamal, Aleksandr Panov

    Abstract: Decision-making, motion planning, and trajectory prediction are crucial in autonomous driving systems. By accurately forecasting the movements of other road users, the decision-making capabilities of the autonomous system can be enhanced, making it more effective in responding to dynamic and unpredictable environments and more adaptive to diverse road scenarios. This paper presents the FFStreams++… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 17 pages, 8 figures

  15. arXiv:2409.00134  [pdf, other

    cs.MA cs.AI cs.LG

    MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

    Authors: Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik

    Abstract: Multi-agent pathfinding (MAPF) is a problem that generally requires finding collision-free paths for multiple agents in a shared environment. Solving MAPF optimally, even under restrictive assumptions, is NP-hard, yet efficient solutions for this problem are critical for numerous applications, such as automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have… ▽ More

    Submitted 8 April, 2025; v1 submitted 29 August, 2024; originally announced September 2024.

  16. arXiv:2408.13881  [pdf, other

    cs.RO cs.LG

    Safe Policy Exploration Improvement via Subgoals

    Authors: Brian Angulo, Gregory Gorbov, Aleksandr Panov, Konstantin Yakovlev

    Abstract: Reinforcement learning is a widely used approach to autonomous navigation, showing potential in various tasks and robotic setups. Still, it often struggles to reach distant goals when safety constraints are imposed (e.g., the wheeled robot is prohibited from moving close to the obstacles). One of the main reasons for poor performance in such setups, which is common in practice, is that the need to… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 11 pages, 8 figures

  17. arXiv:2407.14931  [pdf, other

    cs.LG cs.AI cs.MA

    POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding

    Authors: Alexey Skrynnik, Anton Andreychuk, Anatolii Borzilov, Alexander Chernyavskiy, Konstantin Yakovlev, Aleksandr Panov

    Abstract: Multi-agent reinforcement learning (MARL) has recently excelled in solving challenging cooperative and competitive multi-agent problems in various environments, typically involving a small number of agents and full observability. Moreover, a range of crucial robotics-related tasks, such as multi-robot pathfinding, which have traditionally been approached with classical non-learnable methods (e.g.,… ▽ More

    Submitted 8 April, 2025; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: Published as a conference paper at The International Conference on Learning Representations 2025

  18. arXiv:2407.13518  [pdf, other

    cs.LG

    Model-based Policy Optimization using Symbolic World Model

    Authors: Andrey Gorodetskiy, Konstantin Mironov, Aleksandr Panov

    Abstract: The application of learning-based control methods in robotics presents significant challenges. One is that model-free reinforcement learning algorithms use observation data with low sample efficiency. To address this challenge, a prevalent approach is model-based reinforcement learning, which involves employing an environment dynamics model. We suggest approximating transition dynamics with symbol… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  19. arXiv:2407.09287  [pdf, other

    cs.AI

    Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

    Authors: Zoya Volovikova, Alexey Skrynnik, Petr Kuderov, Aleksandr I. Panov

    Abstract: In this study, we address the issue of enabling an artificial intelligence agent to execute complex language instructions within virtual environments. In our framework, we assume that these instructions involve intricate linguistic structures and multiple interdependent tasks that must be navigated successfully to achieve the desired outcomes. To effectively manage these complexities, we propose a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  20. arXiv:2312.15908  [pdf, other

    cs.AI cs.LG cs.MA

    Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding

    Authors: Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov

    Abstract: The Multi-Agent Pathfinding (MAPF) problem involves finding a set of conflict-free paths for a group of agents confined to a graph. In typical MAPF scenarios, the graph and the agents' starting and ending vertices are known beforehand, allowing the use of centralized planning algorithms. However, in this study, we focus on the decentralized MAPF setting, where the agents may observe the other agen… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: The paper is accepted to AAAI-2024 conference

  21. arXiv:2311.06295  [pdf, other

    physics.chem-ph cs.LG

    Gradual Optimization Learning for Conformational Energy Minimization

    Authors: Artem Tsypin, Leonid Ugadiarov, Kuzma Khrabrov, Alexander Telepov, Egor Rumiantsev, Alexey Skrynnik, Aleksandr I. Panov, Dmitry Vetrov, Elena Tutubalina, Artur Kadurin

    Abstract: Molecular conformation optimization is crucial to computer-aided drug discovery and materials design. Traditional energy minimization techniques rely on iterative optimization methods that use molecular forces calculated by a physical simulator (oracle) as anti-gradients. However, this is a computationally expensive approach that requires many interactions with a physical simulator. One way to acc… ▽ More

    Submitted 12 March, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: Published as a conference paper at ICLR2024 (Poster)

  22. arXiv:2311.04640  [pdf, other

    cs.LG cs.AI cs.CV

    Object-Centric Learning with Slot Mixture Module

    Authors: Daniil Kirilenko, Vitaliy Vorobyov, Alexey K. Kovalev, Aleksandr I. Panov

    Abstract: Object-centric architectures usually apply a differentiable module to the entire feature map to decompose it into sets of entity representations called slots. Some of these methods structurally resemble clustering algorithms, where the cluster's center in latent space serves as a slot representation. Slot Attention is an example of such a method, acting as a learnable analog of the soft k-means al… ▽ More

    Submitted 25 December, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Published as a conference paper at ICLR 2024

  23. arXiv:2311.04107  [pdf, other

    cs.RO cs.CV

    Interactive Semantic Map Representation for Skill-based Visual Object Navigation

    Authors: Tatiana Zemskova, Aleksei Staroverov, Kirill Muravyev, Dmitry Yudin, Aleksandr Panov

    Abstract: Visual object navigation using learning methods is one of the key tasks in mobile robotics. This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. It is based on a neural network method that adjusts the weights of the segmentation model with backpropagation of the predicted fusion loss values during inference on… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  24. arXiv:2310.17178  [pdf, other

    cs.AI cs.LG cs.RO

    Relational Object-Centric Actor-Critic

    Authors: Leonid Ugadiarov, Vitaliy Vorobyov, Aleksandr I. Panov

    Abstract: The advances in unsupervised object-centric representation learning have significantly improved its application to downstream tasks. Recent works highlight that disentangled object representations can aid policy learning in image-based, object-centric reinforcement learning tasks. This paper proposes a novel object-centric reinforcement learning algorithm that integrates actor-critic and model-bas… ▽ More

    Submitted 20 March, 2025; v1 submitted 26 October, 2023; originally announced October 2023.

  25. arXiv:2310.16362  [pdf, other

    cs.RO cs.LG

    Neural Potential Field for Obstacle-Aware Local Motion Planning

    Authors: Muhammad Alhaddad, Konstantin Mironov, Aleksey Staroverov, Aleksandr Panov

    Abstract: Model predictive control (MPC) may provide local motion planning for mobile robotic platforms. The challenging aspect is the analytic representation of collision cost for the case when both the obstacle map and robot footprint are arbitrary. We propose a Neural Potential Field: a neural network model that returns a differentiable collision cost based on robot pose, obstacle map, and robot footprin… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  26. arXiv:2310.13391  [pdf, ps, other

    cs.LG cs.AI cs.NE

    Learning Successor Features with Distributed Hebbian Temporal Memory

    Authors: Evgenii Dzhivelikian, Petr Kuderov, Aleksandr I. Panov

    Abstract: This paper presents a novel approach to address the challenge of online sequence learning for decision making under uncertainty in non-stationary, partially observable environments. The proposed algorithm, Distributed Hebbian Temporal Memory (DHTM), is based on the factor graph formalism and a multi-component neuron model. DHTM aims to capture sequential data relationships and make cumulative pred… ▽ More

    Submitted 1 June, 2025; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: Poster on ICLR 2025

  27. arXiv:2310.12031  [pdf, other

    cs.CV cs.AI cs.LG

    SegmATRon: Embodied Adaptive Semantic Segmentation for Indoor Environment

    Authors: Tatiana Zemskova, Margarita Kichik, Dmitry Yudin, Aleksei Staroverov, Aleksandr Panov

    Abstract: This paper presents an adaptive transformer model named SegmATRon for embodied image semantic segmentation. Its distinctive feature is the adaptation of model weights during inference on several images using a hybrid multicomponent loss function. We studied this model on datasets collected in the photorealistic Habitat and the synthetic AI2-THOR Simulators. We showed that obtaining additional imag… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: 14 pages, 6 figures

  28. arXiv:2310.01207  [pdf, other

    cs.AI cs.MA

    Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning

    Authors: Alexey Skrynnik, Anton Andreychuk, Maria Nesterova, Konstantin Yakovlev, Aleksandr Panov

    Abstract: Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that posses all the information on the agents' locations and goals is absent and the agents have to sequientially de… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 12 pages, 11 figures

  29. arXiv:2307.14568  [pdf, other

    cs.RO cs.AI cs.LG

    Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning

    Authors: Brian Angulo, Gregory Gorbov, Aleksandr Panov, Konstantin Yakovlev

    Abstract: While reinforcement learning algorithms have had great success in the field of autonomous navigation, they cannot be straightforwardly applied to the real autonomous systems without considering the safety constraints. The later are crucial to avoid unsafe behaviors of the autonomous vehicle on the road. To highlight the importance of these constraints, in this study, we compare two learnable navig… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: 4 pages, 5 figures

  30. arXiv:2307.13453  [pdf, other

    cs.AI

    Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results

    Authors: Yelisey Pitanov, Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov

    Abstract: In this work we study a well-known and challenging problem of Multi-agent Pathfinding, when a set of agents is confined to a graph, each agent is assigned a unique start and goal vertices and the task is to find a set of collision-free paths (one for each agent) such that each agent reaches its respective goal. We investigate how to utilize Monte-Carlo Tree Search (MCTS) to solve the problem. Alth… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: The paper is accepted to HAIS 2023

  31. arXiv:2306.09459  [pdf, other

    cs.LG cs.AI

    Recurrent Action Transformer with Memory

    Authors: Egor Cherepanov, Alexey Staroverov, Dmitry Yudin, Alexey K. Kovalev, Aleksandr I. Panov

    Abstract: Recently, the use of transformers in offline reinforcement learning has become a rapidly developing area. This is due to their ability to treat the agent's trajectory in the environment as a sequence, thereby reducing the policy learning problem to sequence modeling. In environments where the agent's decisions depend on past events (POMDPs), it is essential to capture both the event itself and the… ▽ More

    Submitted 14 October, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 24 pages, 14 figures

  32. arXiv:2301.10067  [pdf, other

    cs.LG cs.AI

    Intrinsic Motivation in Model-based Reinforcement Learning: A Brief Review

    Authors: Artem Latyshev, Aleksandr I. Panov

    Abstract: The reinforcement learning research area contains a wide range of methods for solving the problems of intelligent agent control. Despite the progress that has been made, the task of creating a highly autonomous agent is still a significant challenge. One potential solution to this problem is intrinsic motivation, a concept derived from developmental psychology. This review considers the existing m… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: 13 pages, 7 figures

  33. Reinforcement Learning with Success Induced Task Prioritization

    Authors: Maria Nesterova, Alexey Skrynnik, Aleksandr Panov

    Abstract: Many challenging reinforcement learning (RL) problems require designing a distribution of tasks that can be applied to train effective policies. This distribution of tasks can be specified by the curriculum. A curriculum is meant to improve the results of learning and accelerate it. We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning, where a task… ▽ More

    Submitted 30 December, 2022; originally announced January 2023.

    Journal ref: MICAI 2022. Lecture Notes in Computer Science, vol 13612

  34. arXiv:2212.14649  [pdf, other

    cs.CV cs.AI

    HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images

    Authors: Dmitry Yudin, Yaroslav Solomentsev, Ruslan Musaev, Aleksei Staroverov, Aleksandr I. Panov

    Abstract: We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment and loop detection in simultaneous localization and mapping. The loop detection sub-task is especially relevant when a robot with an on-board RGB-D camera can drive past the same place (``Point") at different angles. The dataset is based on the popular Habi… ▽ More

    Submitted 30 December, 2022; originally announced December 2022.

    Comments: Accepted for publishing in proceedings of the 29th International Conference on Neural Information Processing (ICONIP 2022)

  35. arXiv:2212.14307  [pdf, other

    cs.RO

    Policy Optimization to Learn Adaptive Motion Primitives in Path Planning with Dynamic Obstacles

    Authors: Brian Angulo, Aleksandr Panov, Konstantin Yakovlev

    Abstract: This paper addresses the kinodynamic motion planning for non-holonomic robots in dynamic environments with both static and dynamic obstacles -- a challenging problem that lacks a universal solution yet. One of the promising approaches to solve it is decomposing the problem into the smaller sub problems and combining the local solutions into the global one. The crux of any planning method for non-h… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: 8 pages, 10 figures

  36. arXiv:2212.11730  [pdf, other

    cs.AI cs.LG

    TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers

    Authors: Daniil Kirilenko, Anton Andreychuk, Aleksandr Panov, Konstantin Yakovlev

    Abstract: Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: Pre-print of the paper accepted to AAAI'23

  37. arXiv:2211.06552  [pdf, other

    cs.CL cs.AI

    Collecting Interactive Multi-modal Datasets for Grounded Language Understanding

    Authors: Shrestha Mohanty, Negar Arabzadeh, Milagro Teruel, Yuxuan Sun, Artem Zholus, Alexey Skrynnik, Mikhail Burtsev, Kavya Srinet, Aleksandr Panov, Arthur Szlam, Marc-Alexandre Côté, Julia Kiseleva

    Abstract: Human intelligence can remarkably adapt quickly to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research which can enable similar capabilities in machines, we made the following contributions (1) formalized the co… ▽ More

    Submitted 21 March, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Journal ref: Interactive Learning for Natural Language Processing NeurIPS 2022 Workshop

  38. arXiv:2211.00688  [pdf, other

    cs.AI cs.CL

    Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

    Authors: Alexey Skrynnik, Zoya Volovikova, Marc-Alexandre Côté, Anton Voronov, Artem Zholus, Negar Arabzadeh, Shrestha Mohanty, Milagro Teruel, Ahmed Awadallah, Aleksandr Panov, Mikhail Burtsev, Julia Kiseleva

    Abstract: The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of bu… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: 6 pages, 3 figures

  39. arXiv:2206.10944  [pdf, other

    cs.LG cs.AI cs.MA

    POGEMA: Partially Observable Grid Environment for Multiple Agents

    Authors: Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, Aleksandr I. Panov

    Abstract: We introduce POGEMA (https://github.com/AIRI-Institute/pogema) a sandbox for challenging partially observable multi-agent pathfinding (PO-MAPF) problems . This is a grid-based environment that was specifically designed to be a flexible, tunable and scalable benchmark. It can be tailored to a variety of PO-MAPF, which can serve as an excellent testing ground for planning and learning methods, and t… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: 7 pages, 7 figures

  40. arXiv:2206.00142  [pdf, other

    cs.LG cs.AI cs.CL

    IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

    Authors: Artem Zholus, Alexey Skrynnik, Shrestha Mohanty, Zoya Volovikova, Julia Kiseleva, Artur Szlam, Marc-Alexandre Coté, Aleksandr I. Panov

    Abstract: We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.

    Submitted 31 May, 2022; originally announced June 2022.

  41. arXiv:2205.13771  [pdf, other

    cs.CL

    IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

    Authors: Julia Kiseleva, Alexey Skrynnik, Artem Zholus, Shrestha Mohanty, Negar Arabzadeh, Marc-Alexandre Côté, Mohammad Aliannejadi, Milagro Teruel, Ziming Li, Mikhail Burtsev, Maartje ter Hoeve, Zoya Volovikova, Aleksandr Panov, Yuxuan Sun, Kavya Srinet, Arthur Szlam, Ahmed Awadallah

    Abstract: Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2110.06536

  42. arXiv:2205.02388  [pdf, other

    cs.CL cs.AI

    Interactive Grounded Language Understanding in a Collaborative Environment: IGLU 2021

    Authors: Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Marc-Alexandre Côté, Katja Hofmann, Ahmed Awadallah, Linar Abdrazakov, Igor Churin, Putra Manggala, Kata Naszadi, Michiel van der Meer, Taewoon Kim

    Abstract: Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: Interactive Grounded Language Understanding in a Co… ▽ More

    Submitted 27 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.06536

    Journal ref: Proceedings of Machine Learning Research NeurIPS 2021 Competition and Demonstration Track

  43. arXiv:2110.13241  [pdf, other

    cs.LG

    Multitask Adaptation by Retrospective Exploration with Learned World Models

    Authors: Artem Zholus, Aleksandr I. Panov

    Abstract: Model-based reinforcement learning (MBRL) allows solving complex tasks in a sample-efficient manner. However, no information is reused between the tasks. In this work, we propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from continuously growing task-agnostic storage. The model is trained to maximize the expected agent's performance by sel… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  44. arXiv:2110.06536  [pdf, other

    cs.AI

    NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment

    Authors: Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Katja Hofmann, Michel Galley, Ahmed Awadallah

    Abstract: Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor… ▽ More

    Submitted 14 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

  45. arXiv:2109.10173  [pdf, other

    cs.LG cs.AI

    Long-Term Exploration in Persistent MDPs

    Authors: Leonid Ugadiarov, Alexey Skrynnik, Aleksandr I. Panov

    Abstract: Exploration is an essential part of reinforcement learning, which restricts the quality of learned policy. Hard-exploration environments are defined by huge state space and sparse rewards. In such conditions, an exhaustive exploration of the environment is often impossible, and the successful training of an agent requires a lot of interaction steps. In this paper, we propose an exploration method… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: This is a preprint of the paper accepted to MICAI 2021. It contains 13 pages and 6 figures

  46. arXiv:2109.09512  [pdf, other

    cs.AI cs.RO

    Landmark Policy Optimization for Object Navigation Task

    Authors: Aleksey Staroverov, Aleksandr I. Panov

    Abstract: This work studies object goal navigation task, which involves navigating to the closest object related to the given semantic category in unseen environments. Recent works have shown significant achievements both in the end-to-end Reinforcement Learning approach and modular systems, but need a big step forward to be robust and optimal. We propose a hierarchical method that incorporates standard tas… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

  47. arXiv:2108.06148  [pdf, other

    cs.LG cs.AI

    Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable Grid Environments

    Authors: Vasilii Davydov, Alexey Skrynnik, Konstantin Yakovlev, Aleksandr I. Panov

    Abstract: In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they, typically, rely on the full knowledge of the environment. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these polici… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

    Comments: This is a preprint of the paper accepted to RCAI 2021. It contains 11 pages and 5 figures

  48. arXiv:2006.09950  [pdf, other

    cs.LG cs.AI

    Delta Schema Network in Model-based Reinforcement Learning

    Authors: Andrey Gorodetskiy, Alexandra Shlychkova, Aleksandr I. Panov

    Abstract: This work is devoted to unresolved problems of Artificial General Intelligence - the inefficiency of transfer learning. One of the mechanisms that are used to solve this problem in the area of reinforcement learning is a model-based approach. In the paper we are expanding the schema networks method which allows to extract the logical relationships between objects and actions from the environment d… ▽ More

    Submitted 8 July, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Published at the AGI 2020 conference

  49. arXiv:2006.09939  [pdf, other

    cs.LG cs.AI

    Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations

    Authors: Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, Kirill Aksenov, Vasilii Davydov, Aleksandr I. Panov

    Abstract: Currently, deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. Often these results are achieved at the expense of huge computational costs and require an incredible number of episodes of interaction between the agent and the environment. There are two main approaches to improving the sample efficiency of reinforcement learning methods - using hiera… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  50. arXiv:1912.08664  [pdf, other

    cs.AI

    Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft

    Authors: Alexey Skrynnik, Aleksey Staroverov, Ermek Aitygulov, Kirill Aksenov, Vasilii Davydov, Aleksandr I. Panov

    Abstract: We present Hierarchical Deep Q-Network (HDQfD) that took first place in the MineRL competition. HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data. We present a structured task-dependent replay buffer and adaptive prioritizing tech… ▽ More

    Submitted 13 July, 2020; v1 submitted 18 December, 2019; originally announced December 2019.