Skip to main content

Showing 1–1 of 1 results for author: Vishnu, V K

.
  1. arXiv:2311.08123  [pdf, other

    cs.LG cs.CL

    Memory-efficient Stochastic methods for Memory-based Transformers

    Authors: Vishwajit Kumar Vishnu, C. Chandra Sekhar

    Abstract: Training Memory-based transformers can require a large amount of memory and can be quite inefficient. We propose a novel two-phase training mechanism and a novel regularization technique to improve the training efficiency of memory-based transformers, which are often used for long-range context problems. For our experiments, we consider transformer-XL as our baseline model which is one of memoryba… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.