Skip to main content

Showing 1–6 of 6 results for author: Burtsev, M S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.11062  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Transformer to 1M tokens and beyond with RMT

    Authors: Aydar Bulatov, Yuri Kuratov, Yermek Kapushev, Mikhail S. Burtsev

    Abstract: A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute. Our approach demonstrates the capability to store information in memory for sequences of up… ▽ More

    Submitted 6 February, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

  2. arXiv:2207.06881  [pdf, other

    cs.CL cs.LG

    Recurrent Memory Transformer

    Authors: Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev

    Abstract: Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-… ▽ More

    Submitted 8 December, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  3. arXiv:2006.11527  [pdf, other

    cs.CL cs.LG cs.NE

    Memory Transformer

    Authors: Mikhail S. Burtsev, Yuri Kuratov, Anton Peganov, Grigory V. Sapunov

    Abstract: Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware representations. However, information about the context is stored mostly in the same element-wise representations. This might limit the processing of properties related… ▽ More

    Submitted 16 February, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

  4. arXiv:1905.02662  [pdf, other

    cs.NE cs.AI cs.LG

    Continual and Multi-task Reinforcement Learning With Shared Episodic Memory

    Authors: Artyom Y. Sorokin, Mikhail S. Burtsev

    Abstract: Episodic memory plays an important role in the behavior of animals and humans. It allows the accumulation of information about current state of the environment in a task-agnostic way. This episodic representation can be later accessed by down-stream tasks in order to make their execution more efficient. In this work, we introduce the neural architecture with shared episodic memory (SEM) for learni… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: Presented at the Task-Agnostic Reinforcement Learning Workshop at ICLR 2019

  5. arXiv:1709.09686  [pdf, ps, other

    cs.CL

    Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition

    Authors: L. T. Anh, M. Y. Arkhipov, M. S. Burtsev

    Abstract: Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of approach… ▽ More

    Submitted 8 October, 2017; v1 submitted 27 September, 2017; originally announced September 2017.

    Comments: Artificial Intelligence and Natural Language Conference (AINL 2017)

  6. arXiv:cs/0110021  [pdf

    cs.NE

    Alife Model of Evolutionary Emergence of Purposeful Adaptive Behavior

    Authors: Mikhail S. Burtsev, Vladimir G. Redko, Roman V. Gusarev

    Abstract: The process of evolutionary emergence of purposeful adaptive behavior is investigated by means of computer simulations. The model proposed implies that there is an evolving population of simple agents, which have two natural needs: energy and reproduction. Any need is characterized quantitatively by a corresponding motivation. Motivations determine goal-directed behavior of agents. The model dem… ▽ More

    Submitted 8 October, 2001; originally announced October 2001.

    Comments: 9 pages, 5 figures. Full version of poster presentation on ECAL 2001 (see "Advances in Artificial Life." J. Kelemen, P. Sosik (Eds.), 6th European Conference, ECAL 2001, Prague, Czech Republic, September 10-14, 2001, Proceedings, p. 413.)

    ACM Class: I.2.6; I.2.8; I.2.11