Skip to main content

Showing 1–2 of 2 results for author: Sivtsov, D

.
  1. arXiv:2506.05229  [pdf, ps, other

    cs.LG cs.CL

    Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts

    Authors: Danil Sivtsov, Ivan Rodkin, Gleb Kuzmin, Yuri Kuratov, Ivan Oseledets

    Abstract: Transformer models struggle with long-context inference due to their quadratic time and linear memory complexity. Recurrent Memory Transformers (RMTs) offer a solution by reducing the asymptotic cost to linear time and constant memory usage. However, their memory update mechanism leads to sequential execution, causing a performance bottleneck. We introduce Diagonal Batching, a scheduling scheme… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  2. arXiv:2505.09218  [pdf, ps, other

    cs.LG cs.DC math.OC

    Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods

    Authors: Alexander Tyurin, Danil Sivtsov

    Abstract: We propose a new unifying framework, Birch SGD, for analyzing and designing distributed SGD methods. The central idea is to represent each method as a weighted directed tree, referred to as a computation tree. Leveraging this representation, we introduce a general theoretical result that reduces convergence analysis to studying the geometry of these trees. This perspective yields a purely graph-ba… ▽ More

    Submitted 25 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.