Skip to main content

Showing 1–6 of 6 results for author: Diskin, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2302.11640  [pdf, ps, other

    cs.LG

    A critical look at the evaluation of GNNs under heterophily: Are we really making progress?

    Authors: Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, Liudmila Prokhorenkova

    Abstract: Node classification is a classical graph machine learning task on which Graph Neural Networks (GNNs) have recently achieved strong results. However, it is often believed that standard GNNs only work well for homophilous graphs, i.e., graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous, and it is typically assumed that specialized method… ▽ More

    Submitted 2 March, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

  2. arXiv:2301.11913  [pdf, other

    cs.DC cs.LG

    SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

    Authors: Max Ryabinin, Tim Dettmers, Michael Diskin, Alexander Borzunov

    Abstract: Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel… ▽ More

    Submitted 29 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: Accepted to International Conference on Machine Learning (ICML) 2023. 25 pages, 8 figures

  3. arXiv:2207.03481  [pdf, other

    cs.LG cs.DC

    Training Transformers Together

    Authors: Alexander Borzunov, Max Ryabinin, Tim Dettmers, Quentin Lhoest, Lucile Saulnier, Michael Diskin, Yacine Jernite, Thomas Wolf

    Abstract: The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, w… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted to NeurIPS 2021 Demonstration Track. 10 pages, 2 figures. Link: https://training-transformers-together.github.io

  4. arXiv:2110.03313  [pdf, other

    cs.LG stat.ML

    Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

    Authors: Aleksandr Beznosikov, Peter Richtárik, Michael Diskin, Max Ryabinin, Alexander Gasnikov

    Abstract: Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed tr… ▽ More

    Submitted 2 April, 2023; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Appears in: Advances in Neural Information Processing Systems 35 (NeurIPS 2022). Minor modifications with respect to the NeurIPS version. 73 pages, 9 algorithms, 2 figures, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2022/hash/5ac1428c23b5da5e66d029646ea3206d-Abstract-Conference.html

  5. arXiv:2106.11257  [pdf, other

    cs.LG cs.DC math.OC

    Secure Distributed Training at Scale

    Authors: Eduard Gorbunov, Alexander Borzunov, Michael Diskin, Max Ryabinin

    Abstract: Many areas of deep learning benefit from using increasingly larger neural networks trained on public data, as is the case for pre-trained models for NLP and computer vision. Training such models requires a lot of computational resources (e.g., HPC clusters) that are not available to small research groups and independent researchers. One way to address it is for several smaller groups to pool their… ▽ More

    Submitted 1 January, 2023; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted to International Conference on Machine Learning (ICML 2022). 61 pages, 10 figures. The version 4 fixes inaccuracies in the proofs of Lemmas E.2 and E.4. Code: https://github.com/yandex-research/btard

  6. arXiv:2106.10207  [pdf, other

    cs.LG cs.DC

    Distributed Deep Learning in Open Collaborations

    Authors: Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko

    Abstract: Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a… ▽ More

    Submitted 8 November, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2021. 32 pages, 10 figures. Code: https://github.com/yandex-research/DeDLOC