A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

Dahri, Tarique; Memon, Zulfiqar Ali; Yu, Zhenyu; Idris, Mohd. Yamani Idna; Khan, Sheheryar; Ahmad, Sadiq; Shoman, Maged; Aziz, Saddam; Qureshi, Rizwan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2506.07055 (cs)

[Submitted on 8 Jun 2025]

Title:A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

Authors:Tarique Dahri, Zulfiqar Ali Memon, Zhenyu Yu, Mohd. Yamani Idna Idris, Sheheryar Khan, Sadiq Ahmad, Maged Shoman, Saddam Aziz, Rizwan Qureshi

View PDF HTML (experimental)

Abstract:We introduce Layered Self-Supervised Knowledge Distillation (LSSKD) framework for training compact deep learning models. Unlike traditional methods that rely on pre-trained teacher networks, our approach appends auxiliary classifiers to intermediate feature maps, generating diverse self-supervised knowledge and enabling one-to-one transfer across different network stages. Our method achieves an average improvement of 4.54\% over the state-of-the-art PS-KD method and a 1.14% gain over SSKD on CIFAR-100, with a 0.32% improvement on ImageNet compared to HASSKD. Experiments on Tiny ImageNet and CIFAR-100 under few-shot learning scenarios also achieve state-of-the-art results. These findings demonstrate the effectiveness of our approach in enhancing model generalization and performance without the need for large over-parameterized teacher networks. Importantly, at the inference stage, all auxiliary classifiers can be removed, yielding no extra computational cost. This makes our model suitable for deploying small language models on affordable low-computing devices. Owing to its lightweight design and adaptability, our framework is particularly suitable for multimodal sensing and cyber-physical environments that require efficient and responsive inference. LSSKD facilitates the development of intelligent agents capable of learning from limited sensory data under weak supervision.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2506.07055 [cs.CV]
	(or arXiv:2506.07055v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2506.07055

Submission history

From: Zhenyu Yu [view email]
[v1] Sun, 8 Jun 2025 09:30:48 UTC (2,109 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators