Search | arXiv e-print repository

SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

Authors: Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, Furong Huang

Abstract: The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this wo… ▽ More The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. Theoretically, we prove that SWIFT matches the gold-standard iteration convergence rate $\mathcal{O}(1/\sqrt{T})$ of parallel stochastic gradient descent for convex and non-convex smooth optimization (total iterations $T$). Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms. Although SWIFT achieves the same iteration convergence rate with respect to $T$ as other state-of-the-art (SOTA) parallel stochastic algorithms, it converges faster with respect to run-time due to its wait-free structure. Our experimental results demonstrate that SWIFT's run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: 30 pages, 9 figures

arXiv:2108.09264 [pdf, other]

Practical and Fast Momentum-Based Power Methods

Authors: Tahseen Rabbani, Apollo Jain, Arjun Rajkumar, Furong Huang

Abstract: The power method is a classical algorithm with broad applications in machine learning tasks, including streaming PCA, spectral clustering, and low-rank matrix approximation. The distilled purpose of the vanilla power method is to determine the largest eigenvalue (in absolute modulus) and its eigenvector of a matrix. A momentum-based scheme can be used to accelerate the power method, but achieving… ▽ More The power method is a classical algorithm with broad applications in machine learning tasks, including streaming PCA, spectral clustering, and low-rank matrix approximation. The distilled purpose of the vanilla power method is to determine the largest eigenvalue (in absolute modulus) and its eigenvector of a matrix. A momentum-based scheme can be used to accelerate the power method, but achieving an optimal convergence rate with existing algorithms critically relies on additional spectral information that is unavailable at run-time, and sub-optimal initializations can result in divergence. In this paper, we provide a pair of novel momentum-based power methods, which we call the delayed momentum power method (DMPower) and a streaming variant, the delayed momentum streaming method (DMStream). Our methods leverage inexact deflation and are capable of achieving near-optimal convergence with far less restrictive hyperparameter requirements. We provide convergence analyses for both algorithms through the lens of perturbation theory. Further, we experimentally demonstrate that DMPower routinely outperforms the vanilla power method and that both algorithms match the convergence speed of an oracle running existing accelerated methods with perfect spectral knowledge. △ Less

Submitted 20 August, 2021; originally announced August 2021.

arXiv:2004.01214 [pdf, ps, other]

doi 10.2140/ant.2023.17.93

Constructions of difference sets in nonabelian 2-groups

Authors: Taylor Applebaum, John Clikeman, James A. Davis, John F. Dillon, Jonathan Jedwab, Tahseen Rabbani, Ken Smith, William Yolland

Abstract: Difference sets have been studied for more than 80 years. Techniques from algebraic number theory, group theory, finite geometry, and digital communications engineering have been used to establish constructive and nonexistence results. We provide a new theoretical approach which dramatically expands the class of $2$-groups known to contain a difference set, by refining the concept of covering exte… ▽ More Difference sets have been studied for more than 80 years. Techniques from algebraic number theory, group theory, finite geometry, and digital communications engineering have been used to establish constructive and nonexistence results. We provide a new theoretical approach which dramatically expands the class of $2$-groups known to contain a difference set, by refining the concept of covering extended building sets introduced by Davis and Jedwab in 1997. We then describe how product constructions and other methods can be used to construct difference sets in some of the remaining $2$-groups. We announce the completion of ten years of collaborative work to determine precisely which of the 56,092 nonisomorphic groups of order 256 contain a difference set. All groups of order 256 not excluded by the two classical nonexistence criteria are found to contain a difference set, in agreement with previous findings for groups of order 4, 16, and 64. We provide suggestions for how the existence question for difference sets in $2$-groups of all orders might be resolved. △ Less

Submitted 13 January, 2022; v1 submitted 2 April, 2020; originally announced April 2020.

Comments: 31 pages, 3 figures, 2 tables. New Section 5 gives details of computer implementation for groups of order 256. Section 4 has been updated to reflect further streamlining of the search procedures

MSC Class: 05B10; 05E18 (primary)

Journal ref: Alg. Number Th. 17 (2023) 93-130

arXiv:1009.4219 [pdf, ps, other]

Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems

Authors: Laurent El Ghaoui, Vivian Viallon, Tarek Rabbani

Abstract: We describe a fast method to eliminate features (variables) in l1 -penalized least-square regression (or LASSO) problems. The elimination of features leads to a potentially substantial reduction in running time, specially for large values of the penalty parameter. Our method is not heuristic: it only eliminates features that are guaranteed to be absent after solving the LASSO problem. The feature… ▽ More We describe a fast method to eliminate features (variables) in l1 -penalized least-square regression (or LASSO) problems. The elimination of features leads to a potentially substantial reduction in running time, specially for large values of the penalty parameter. Our method is not heuristic: it only eliminates features that are guaranteed to be absent after solving the LASSO problem. The feature elimination step is easy to parallelize and can test each feature for elimination independently. Moreover, the computational effort of our method is negligible compared to that of solving the LASSO problem - roughly it is the same as single gradient step. Our method extends the scope of existing LASSO algorithms to treat larger data sets, previously out of their reach. We show how our method can be extended to general l1 -penalized convex problems and present preliminary results for the Sparse Support Vector Machine and Logistic Regression problems. △ Less

Submitted 18 May, 2011; v1 submitted 21 September, 2010; originally announced September 2010.

Comments: Submitted to JMLR in April 2011

arXiv:1009.3515

Safe Feature Elimination in Sparse Supervised Learning

Authors: Laurent El Ghaoui, Vivian Viallon, Tarek Rabbani

Abstract: We investigate fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a $l_1$-norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm. The methods are not heuristic: they only eliminate features that are {\em guaranteed} to be absent aft… ▽ More We investigate fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a $l_1$-norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm. The methods are not heuristic: they only eliminate features that are {\em guaranteed} to be absent after solving the learning problem. Our framework applies to a large class of problems, including support vector machine classification, logistic regression and least-squares. The complexity of the feature elimination step is negligible compared to the typical computational effort involved in the sparse supervised learning problem: it grows linearly with the number of features times the number of examples, with much better count if data is sparse. We apply our method to data sets arising in text classification and observe a dramatic reduction of the dimensionality, hence in computational effort required to solve the learning problem, especially when very sparse classifiers are sought. Our method allows to immediately extend the scope of existing algorithms, allowing us to run them on data sets of sizes that were out of their reach before. △ Less

Submitted 26 October, 2010; v1 submitted 17 September, 2010; originally announced September 2010.

Comments: New version is on arXiv:1009.4219

Showing 1–5 of 5 results for author: Rabbani, T