Federated Knowledge Distillation

Seo, Hyowoon; Park, Jihong; Oh, Seungeun; Bennis, Mehdi; Kim, Seong-Lyun

Computer Science > Machine Learning

arXiv:2011.02367 (cs)

[Submitted on 4 Nov 2020]

Title:Federated Knowledge Distillation

Authors:Hyowoon Seo, Jihong Park, Seungeun Oh, Mehdi Bennis, Seong-Lyun Kim

View PDF

Abstract:Distributed learning frameworks often rely on exchanging model parameters across workers, instead of revealing their raw data. A prime example is federated learning that exchanges the gradients or weights of each neural network model. Under limited communication resources, however, such a method becomes extremely costly particularly for modern deep neural networks having a huge number of model parameters. In this regard, federated distillation (FD) is a compelling distributed learning solution that only exchanges the model outputs whose dimensions are commonly much smaller than the model sizes (e.g., 10 labels in the MNIST dataset). The goal of this chapter is to provide a deep understanding of FD while demonstrating its communication efficiency and applicability to a variety of tasks. To this end, towards demystifying the operational principle of FD, the first part of this chapter provides a novel asymptotic analysis for two foundational algorithms of FD, namely knowledge distillation (KD) and co-distillation (CD), by exploiting the theory of neural tangent kernel (NTK). Next, the second part elaborates on a baseline implementation of FD for a classification task, and illustrates its performance in terms of accuracy and communication efficiency compared to FL. Lastly, to demonstrate the applicability of FD to various distributed learning tasks and environments, the third part presents two selected applications, namely FD over asymmetric uplink-and-downlink wireless channels and FD for reinforcement learning.

Comments:	30 pages, 12 figures, 2 tables; This chapter is written for the forthcoming book, Machine Learning and Wireless Communications (Cambridge University Press), edited by H. V. Poor, D. Gunduz, A. Goldsmith, and Y. Eldar
Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Cite as:	arXiv:2011.02367 [cs.LG]
	(or arXiv:2011.02367v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2011.02367

Submission history

From: Jihong Park [view email]
[v1] Wed, 4 Nov 2020 15:56:13 UTC (1,900 KB)

Computer Science > Machine Learning

Title:Federated Knowledge Distillation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Federated Knowledge Distillation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators