Skip to main content

Showing 1–1 of 1 results for author: Baranwal, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2105.13120  [pdf, other

    cs.LG cs.DC

    Sequence Parallelism: Long Sequence Training from System Perspective

    Authors: Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You

    Abstract: Transformer achieves promising results on various tasks. However, self-attention suffers from quadratic memory requirements with respect to the sequence length. Existing work focuses on reducing time and space complexity from an algorithm perspective. In this work, we propose sequence parallelism, a memory-efficient parallelism method to help us break input sequence length limitation and train wit… ▽ More

    Submitted 21 May, 2022; v1 submitted 26 May, 2021; originally announced May 2021.