Showing 1–1 of 1 results for author: Deb, U

Search v0.5.6 released 2020-02-24

arXiv:2405.08019 [pdf, other]

cs.LG cs.AI

AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting

Authors: Shreyan Ganguly, Roshan Nayak, Rakshith Rao, Ujan Deb, Prathosh AP

Abstract: Knowledge distillation, a widely used model compression technique, works on the basis of transferring knowledge from a cumbersome teacher model to a lightweight student model. The technique involves jointly optimizing the task specific and knowledge distillation losses with a weight assigned to them. Despite these weights playing a crucial role in the performance of the distillation process, curre… ▽ More Knowledge distillation, a widely used model compression technique, works on the basis of transferring knowledge from a cumbersome teacher model to a lightweight student model. The technique involves jointly optimizing the task specific and knowledge distillation losses with a weight assigned to them. Despite these weights playing a crucial role in the performance of the distillation process, current methods provide equal weight to both losses, leading to suboptimal performance. In this paper, we propose Adaptive Knowledge Distillation, a novel technique inspired by curriculum learning to adaptively weigh the losses at instance level. This technique goes by the notion that sample difficulty increases with teacher loss. Our method follows a plug-and-play paradigm that can be applied on top of any task-specific and distillation objectives. Experiments show that our method performs better than conventional knowledge distillation method and existing instance-level loss functions. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Search v0.5.6 released 2020-02-24