Search | arXiv e-print repository

Understanding the Effects of Projectors in Knowledge Distillation

Authors: Yudong Chen, Sen Wang, Jiajun Liu, Xuwei Xu, Frank de Hoog, Brano Kusy, Zi Huang

Abstract: Conventionally, during the knowledge distillation process (e.g. feature distillation), an additional projector is often required to perform feature transformation due to the dimension mismatch between the teacher and the student networks. Interestingly, we discovered that even if the student and the teacher have the same feature dimensions, adding a projector still helps to improve the distillatio… ▽ More Conventionally, during the knowledge distillation process (e.g. feature distillation), an additional projector is often required to perform feature transformation due to the dimension mismatch between the teacher and the student networks. Interestingly, we discovered that even if the student and the teacher have the same feature dimensions, adding a projector still helps to improve the distillation performance. In addition, projectors even improve logit distillation if we add them to the architecture too. Inspired by these surprising findings and the general lack of understanding of the projectors in the knowledge distillation process from existing literature, this paper investigates the implicit role that projectors play but so far have been overlooked. Our empirical study shows that the student with a projector (1) obtains a better trade-off between the training accuracy and the testing accuracy compared to the student without a projector when it has the same feature dimensions as the teacher, (2) better preserves its similarity to the teacher beyond shallow and numeric resemblance, from the view of Centered Kernel Alignment (CKA), and (3) avoids being over-confident as the teacher does at the testing phase. Motivated by the positive effects of projectors, we propose a projector ensemble-based feature distillation method to further improve distillation performance. Despite the simplicity of the proposed strategy, empirical results from the evaluation of classification tasks on benchmark datasets demonstrate the superior classification performance of our method on a broad range of teacher-student pairs and verify from the aspects of CKA and model calibration that the student's features are of improved quality with the projector ensemble design. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:2210.15274

arXiv:2210.15274 [pdf, other]

Improved Feature Distillation via Projector Ensemble

Authors: Yudong Chen, Sen Wang, Jiajun Liu, Xuwei Xu, Frank de Hoog, Zi Huang

Abstract: In knowledge distillation, previous feature distillation methods mainly focus on the design of loss functions and the selection of the distilled layers, while the effect of the feature projector between the student and the teacher remains under-explored. In this paper, we first discuss a plausible mechanism of the projector with empirical evidence and then propose a new feature distillation method… ▽ More In knowledge distillation, previous feature distillation methods mainly focus on the design of loss functions and the selection of the distilled layers, while the effect of the feature projector between the student and the teacher remains under-explored. In this paper, we first discuss a plausible mechanism of the projector with empirical evidence and then propose a new feature distillation method based on a projector ensemble for further performance improvement. We observe that the student network benefits from a projector even if the feature dimensions of the student and the teacher are the same. Training a student backbone without a projector can be considered as a multi-task learning process, namely achieving discriminative feature extraction for classification and feature matching between the student and the teacher for distillation at the same time. We hypothesize and empirically verify that without a projector, the student network tends to overfit the teacher's feature distributions despite having different architecture and weights initialization. This leads to degradation on the quality of the student's deep features that are eventually used in classification. Adding a projector, on the other hand, disentangles the two learning tasks and helps the student network to focus better on the main feature extraction task while still being able to utilize teacher features as a guidance through the projector. Motivated by the positive effect of the projector in feature distillation, we propose an ensemble of projectors to further improve the quality of student features. Experimental results on different datasets with a series of teacher-student pairs illustrate the effectiveness of the proposed method. △ Less

Submitted 28 February, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2004.05725 [pdf, other]

doi 10.1371/journal.pone.0241612

Vaccination strategies on dynamic networks with indirect transmission links and limited contact information

Authors: Md Shahzamal, Raja Jurdak, Bernard Mans, Frank de Hoog, Dean Paini

Abstract: Infectious diseases are still a major global burden for modern society causing 13 million deaths annually. One way to reduce the morbidity and mortality rates from infectious diseases is through preventative or targeted vaccinations. Current vaccination strategies, however, rely on the highly specific individual contact information that is difficult and costly to obtain, in order to identify influ… ▽ More Infectious diseases are still a major global burden for modern society causing 13 million deaths annually. One way to reduce the morbidity and mortality rates from infectious diseases is through preventative or targeted vaccinations. Current vaccination strategies, however, rely on the highly specific individual contact information that is difficult and costly to obtain, in order to identify influential spreading individuals. Current approaches also focus only on direct contacts between individuals for spreading, and disregard indirect transmission where a pathogen can spread between one infected individual and one susceptible individual that visit the same location within a short time-frame without meeting. This paper presents a novel vaccination strategy that relies on coarse-grained contact information, both direct and indirect, that can be easily and efficiently collected. Rather than tracking exact contact degrees of individuals, our strategy uses the types of places people visit to estimate a range of contact degrees for individuals, considering both direct and indirect contacts. We conduct extensive simulations to evaluate the performance of our strategy in comparison to the state of the art's vaccination strategies. Results show that our strategy achieves comparable performance to the oracle approach and outperforms all existing strategies when considering indirect links. △ Less

Submitted 12 April, 2020; originally announced April 2020.

arXiv:1911.03811 [pdf, other]

Generating dynamic contact graphs with indirect links

Authors: Md Shahzamal, Raja Jurdak, Bernard Mans, Frank De Hoog, Dean Paini

Abstract: Graph models are widely used to study diffusion processes in contact networks. Recent data-driven research has highlighted the significance of indirect links, where interactions are possible when two nodes visit the same place at different times (SPDT), in determining network structure and diffusion dynamics. However, how to generate dynamic graphs with indirect links for modeling diffusion remain… ▽ More Graph models are widely used to study diffusion processes in contact networks. Recent data-driven research has highlighted the significance of indirect links, where interactions are possible when two nodes visit the same place at different times (SPDT), in determining network structure and diffusion dynamics. However, how to generate dynamic graphs with indirect links for modeling diffusion remains an unsolved challenge. Here, we present a dynamic contact graph model for generating contact networks with direct and indirect links. Our model introduces the concept of multiple concurrently active copies of a node for capturing indirect transmission links. The SPDT graph model builds on activity driven time-varying network modelling for generating dynamic contact networks using simple statistical distributions. This model is fitted with a large city-scale empirical dataset using maximum likelihood estimation methods. Finally, the performance of the model is evaluated by analysing the capability of capturing the network properties observed in empirical graphs constructed using the location updates of a social networking app and simulating SPDT diffusion processes. Our results show that, in comparison to current graph models that only include direct links, our graph model with indirect links match empirical network properties and diffusion dynamics much more closely. △ Less

Submitted 9 November, 2019; originally announced November 2019.

Comments: 32 Pages Under review

MSC Class: 90B15

arXiv:1906.02405 [pdf, other]

Indirect interactions influence contact network structure and diffusion dynamics

Authors: Md Shahzamal, Raja Jurdak, Bernard Mans, Frank de Hoog

Abstract: Interaction patterns at the individual level influence the behaviour of diffusion over contact networks. Most of the current diffusion models only consider direct interactions among individuals to build underlying infectious items transmission networks. However, delayed indirect interactions, where a susceptible individual interacts with infectious items after the infected individual has left the… ▽ More Interaction patterns at the individual level influence the behaviour of diffusion over contact networks. Most of the current diffusion models only consider direct interactions among individuals to build underlying infectious items transmission networks. However, delayed indirect interactions, where a susceptible individual interacts with infectious items after the infected individual has left the interaction space, can also cause transmission events. We define a diffusion model called the same place different time transmission (SPDT) based diffusion that considers transmission links for these indirect interactions. Our SPDT model changes the network dynamics where the connectivity among individuals varies with the decay rates of link infectivity. We investigate SPDT diffusion behaviours by simulating airborne disease spreading on data-driven contact networks. The SPDT model significantly increases diffusion dynamics (particularly for networks with low link densities where indirect interactions create new infection pathways) and is capable of producing realistic disease reproduction number. Our results show that the SPDT model is significantly more likely to lead to outbreaks compared to current diffusion models with direct interactions. We find that the diffusion dynamics with including indirect links are not reproducible by the current models, highlighting the importance of the indirect links for predicting outbreaks. △ Less

Submitted 6 June, 2019; originally announced June 2019.

arXiv:1806.03386 [pdf, other]

A Graph Model with Indirect Co-location Links

Authors: Md Shahzamal, Raja Jurdak, Bernard Mans, Frank de Hoog

Abstract: Graph models are widely used to analyse diffusion processes embedded in social contacts and to develop applications. A range of graph models are available to replicate the underlying social structures and dynamics realistically. However, most of the current graph models can only consider concurrent interactions among individuals in the co-located interaction networks. However, they do not account… ▽ More Graph models are widely used to analyse diffusion processes embedded in social contacts and to develop applications. A range of graph models are available to replicate the underlying social structures and dynamics realistically. However, most of the current graph models can only consider concurrent interactions among individuals in the co-located interaction networks. However, they do not account for indirect interactions that can transmit spreading items to individuals who visit the same locations at different times but within a certain time limit. The diffusion phenomena occurring through direct and indirect interactions is called same place different time (SPDT) diffusion. This paper introduces a model to synthesize co-located interaction graphs capturing both direct interactions, where individuals meet at a location, and indirect interactions, where individuals visit the same location at different times within a set timeframe. We analyze 60 million location updates made by 2 million users from a social networking application to characterize the graph properties, including the space-time correlations and its time evolving characteristics, such as bursty or ongoing behaviors. The generated synthetic graph reproduces diffusion dynamics of a realistic contact graph, and reduces the prediction error by up to 82% when compare to other contact graph models demonstrating its potential for forecasting epidemic spread. △ Less

Submitted 26 July, 2018; v1 submitted 8 June, 2018; originally announced June 2018.

Comments: MLG2018, 14th International Workshop on Mining and Learning with Graphs (as part of KDD2018), London, UK

arXiv:1803.07968 [pdf, other]

Impact of Indirect Contacts in Emerging Infectious Disease on Social Networks

Authors: Md Shahzamal, Raja Jurdak, Bernard Mans, Ahmad El Shoghri, Frank De Hoog

Abstract: Interaction patterns among individuals play vital roles in spreading infectious diseases. Understanding these patterns and integrating their impact in modeling diffusion dynamics of infectious diseases are important for epidemiological studies. Current network-based diffusion models assume that diseases transmit through interactions where both infected and susceptible individuals are co-located at… ▽ More Interaction patterns among individuals play vital roles in spreading infectious diseases. Understanding these patterns and integrating their impact in modeling diffusion dynamics of infectious diseases are important for epidemiological studies. Current network-based diffusion models assume that diseases transmit through interactions where both infected and susceptible individuals are co-located at the same time. However, there are several infectious diseases that can transmit when a susceptible individual visits a location after an infected individual has left. Recently, we introduced a diffusion model called same place different time (SPDT) transmission to capture the indirect transmissions that happen when an infected individual leaves before a susceptible individual's arrival along with direct transmissions. In this paper, we demonstrate how these indirect transmission links significantly enhance the emergence of infectious diseases simulating airborne disease spreading on a synthetic social contact network. We denote individuals having indirect links but no direct links during their infectious periods as hidden spreaders. Our simulation shows that indirect links play similar roles of direct links and a single hidden spreader can cause large outbreak in the SPDT model which causes no infection in the current model based on direct link. Our work opens new direction in modeling infectious diseases. △ Less

Submitted 30 March, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

Comments: Workshop on Big Data Analytics for Social Computing,2018

arXiv:1512.00901 [pdf, other]

Compressive hyperspectral imaging via adaptive sampling and dictionary learning

Authors: Mingrui Yang, Frank de Hoog, Yuqi Fan, Wen Hu

Abstract: In this paper, we propose a new sampling strategy for hyperspectral signals that is based on dictionary learning and singular value decomposition (SVD). Specifically, we first learn a sparsifying dictionary from training spectral data using dictionary learning. We then perform an SVD on the dictionary and use the first few left singular vectors as the rows of the measurement matrix to obtain the c… ▽ More In this paper, we propose a new sampling strategy for hyperspectral signals that is based on dictionary learning and singular value decomposition (SVD). Specifically, we first learn a sparsifying dictionary from training spectral data using dictionary learning. We then perform an SVD on the dictionary and use the first few left singular vectors as the rows of the measurement matrix to obtain the compressive measurements for reconstruction. The proposed method provides significant improvement over the conventional compressive sensing approaches. The reconstruction performance is further improved by reconditioning the sensing matrix using matrix balancing. We also demonstrate that the combination of dictionary learning and SVD is robust by applying them to different datasets. △ Less

Submitted 2 December, 2015; originally announced December 2015.

arXiv:1511.02928 [pdf]

doi 10.1109/TIP.2016.2614131

Hyperspectral Image Recovery via Hybrid Regularization

Authors: Reza Arablouei, Frank de Hoog

Abstract: Natural images tend to mostly consist of smooth regions with individual pixels having highly correlated spectra. This information can be exploited to recover hyperspectral images of natural scenes from their incomplete and noisy measurements. To perform the recovery while taking full advantage of the prior knowledge, we formulate a composite cost function containing a square-error data-fitting ter… ▽ More Natural images tend to mostly consist of smooth regions with individual pixels having highly correlated spectra. This information can be exploited to recover hyperspectral images of natural scenes from their incomplete and noisy measurements. To perform the recovery while taking full advantage of the prior knowledge, we formulate a composite cost function containing a square-error data-fitting term and two distinct regularization terms pertaining to spatial and spectral domains. The regularization for the spatial domain is the sum of total-variation of the image frames corresponding to all spectral bands. The regularization for the spectral domain is the l1-norm of the coefficient matrix obtained by applying a suitable sparsifying transform to the spectra of the pixels. We use an accelerated proximal-subgradient method to minimize the formulated cost function. We analyze the performance of the proposed algorithm and prove its convergence. Numerical simulations using real hyperspectral images exhibit that the proposed algorithm offers an excellent recovery performance with a number of measurements that is only a small fraction of the hyperspectral image data size. Simulation results also show that the proposed algorithm significantly outperforms an accelerated proximal-gradient algorithm that solves the classical basis-pursuit denoising problem to recover the hyperspectral image. △ Less

Submitted 25 August, 2016; v1 submitted 9 November, 2015; originally announced November 2015.

arXiv:1501.04621 [pdf]

Sparse Bayesian Learning for EEG Source Localization

Authors: Sajib Saha, Frank de Hoog, Ya. I. Nesterets, Rajib Rana, M. Tahtali, T. E. Gureyev

Abstract: Purpose: Localizing the sources of electrical activity from electroencephalographic (EEG) data has gained considerable attention over the last few years. In this paper, we propose an innovative source localization method for EEG, based on Sparse Bayesian Learning (SBL). Methods: To better specify the sparsity profile and to ensure efficient source localization, the proposed approach considers grou… ▽ More Purpose: Localizing the sources of electrical activity from electroencephalographic (EEG) data has gained considerable attention over the last few years. In this paper, we propose an innovative source localization method for EEG, based on Sparse Bayesian Learning (SBL). Methods: To better specify the sparsity profile and to ensure efficient source localization, the proposed approach considers grouping of the electrical current dipoles inside human brain. SBL is used to solve the localization problem in addition with imposed constraint that the electric current dipoles associated with the brain activity are isotropic. Results: Numerical experiments are conducted on a realistic head model that is obtained by segmentation of MRI images of the head and includes four major components, namely the scalp, the skull, the cerebrospinal fluid (CSF) and the brain, with appropriate relative conductivity values. The results demonstrate that the isotropy constraint significantly improves the performance of SBL. In a noiseless environment, the proposed method was 1 found to accurately (with accuracy of >75%) locate up to 6 simultaneously active sources, whereas for SBL without the isotropy constraint, the accuracy of finding just 3 simultaneously active sources was <75%. Conclusions: Compared to the state-of-the-art algorithms, the proposed method is potentially more consistent in specifying the sparsity profile of human brain activity and is able to produce better source localization for EEG. △ Less

Submitted 19 January, 2015; originally announced January 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1406.2434

arXiv:1405.3354 [pdf, other]

New Coherence and RIP Analysis for Weak Orthogonal Matching Pursuit

Authors: Mingrui Yang, Frank de Hoog

Abstract: In this paper we define a new coherence index, named the global 2-coherence, of a given dictionary and study its relationship with the traditional mutual coherence and the restricted isometry constant. By exploring this relationship, we obtain more general results on sparse signal reconstruction using greedy algorithms in the compressive sensing (CS) framework. In particular, we obtain an improved… ▽ More In this paper we define a new coherence index, named the global 2-coherence, of a given dictionary and study its relationship with the traditional mutual coherence and the restricted isometry constant. By exploring this relationship, we obtain more general results on sparse signal reconstruction using greedy algorithms in the compressive sensing (CS) framework. In particular, we obtain an improved bound over the best known results on the restricted isometry constant for successful recovery of sparse signals using orthogonal matching pursuit (OMP). △ Less

Submitted 13 May, 2014; originally announced May 2014.

Comments: arXiv admin note: substantial text overlap with arXiv:1307.1949

arXiv:1307.1949 [pdf, other]

Orthogonal Matching Pursuit with Thresholding and its Application in Compressive Sensing

Authors: Mingrui Yang, Frank de Hoog

Abstract: Greed is good. However, the tighter you squeeze, the less you have. In this paper, a less greedy algorithm for sparse signal reconstruction in compressive sensing, named orthogonal matching pursuit with thresholding is studied. Using the global 2-coherence , which provides a "bridge" between the well known mutual coherence and the restricted isometry constant, the performance of orthogonal matchin… ▽ More Greed is good. However, the tighter you squeeze, the less you have. In this paper, a less greedy algorithm for sparse signal reconstruction in compressive sensing, named orthogonal matching pursuit with thresholding is studied. Using the global 2-coherence , which provides a "bridge" between the well known mutual coherence and the restricted isometry constant, the performance of orthogonal matching pursuit with thresholding is analyzed and more general results for sparse signal reconstruction are obtained. It is also shown that given the same assumption on the coherence index and the restricted isometry constant as required for orthogonal matching pursuit, the thresholding variation gives exactly the same reconstruction performance with significantly less complexity. △ Less

Submitted 1 July, 2015; v1 submitted 8 July, 2013; originally announced July 2013.

Showing 1–12 of 12 results for author: De Hoog, F