Skip to main content

Showing 1–2 of 2 results for author: Abdelmoniem, A M

Searching in archive math. Search in all archives.
.
  1. arXiv:2108.00951  [pdf, other

    cs.LG cs.DC math.OC

    Rethinking gradient sparsification as total error minimization

    Authors: Atal Narayan Sahu, Aritra Dutta, Ahmed M. Abdelmoniem, Trambak Banerjee, Marco Canini, Panos Kalnis

    Abstract: Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-$k$ sparsification, sometimes with $k$ as little as $0.1\%$ of the gradient size, enables training to the same model quality as the uncompressed case for a similar iteration count. From the optimization pers… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: 33 pages, 31 figures

  2. arXiv:1911.08250  [pdf, other

    cs.DC cs.LG math.OC

    On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

    Authors: Aritra Dutta, El Houcine Bergou, Ahmed M. Abdelmoniem, Chen-Yu Ho, Atal Narayan Sahu, Marco Canini, Panos Kalnis

    Abstract: Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model,… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

    Comments: To Appear In Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

    Journal ref: In Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020