Computer Science > Machine Learning
[Submitted on 9 Oct 2020 (v1), last revised 19 Oct 2020 (this version, v2)]
Title:Locally Linear Region Knowledge Distillation
View PDFAbstract:Knowledge distillation (KD) is an effective technique to transfer knowledge from one neural network (teacher) to another (student), thus improving the performance of the student. To make the student better mimic the behavior of the teacher, the existing work focuses on designing different criteria to align their logits or representations. Different from these efforts, we address knowledge distillation from a novel data perspective. We argue that transferring knowledge at sparse training data points cannot enable the student to well capture the local shape of the teacher function. To address this issue, we propose locally linear region knowledge distillation ($\rm L^2$RKD) which transfers the knowledge in local, linear regions from a teacher to a student. This is achieved by enforcing the student to mimic the outputs of the teacher function in local, linear regions. To the end, the student is able to better capture the local shape of the teacher function and thus achieves a better performance. Despite its simplicity, extensive experiments demonstrate that $\rm L^2$RKD is superior to the original KD in many aspects as it outperforms KD and the other state-of-the-art approaches by a large margin, shows robustness and superiority under few-shot settings, and is more compatible with the existing distillation approaches to further improve their performances significantly.
Submission history
From: Xiang Deng [view email][v1] Fri, 9 Oct 2020 21:23:53 UTC (8,120 KB)
[v2] Mon, 19 Oct 2020 08:47:58 UTC (5,385 KB)
References & Citations
export BibTeX citation
Loading...
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
IArxiv Recommender
(What is IArxiv?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.