Search | arXiv e-print repository

LLM meets Vision-Language Models for Zero-Shot One-Class Classification

Authors: Yassir Bendou, Giulia Lioi, Bastien Pasdeloup, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux, Vincent Gripon

Abstract: We consider the problem of zero-shot one-class visual classification, extending traditional one-class classification to scenarios where only the label of the target class is available. This method aims to discriminate between positive and negative query samples without requiring examples from the target class. We propose a two-step solution that first queries large language models for visually con… ▽ More We consider the problem of zero-shot one-class visual classification, extending traditional one-class classification to scenarios where only the label of the target class is available. This method aims to discriminate between positive and negative query samples without requiring examples from the target class. We propose a two-step solution that first queries large language models for visually confusing objects and then relies on vision-language pre-trained models (e.g., CLIP) to perform classification. By adapting large-scale vision benchmarks, we demonstrate the ability of the proposed method to outperform adapted off-the-shelf alternatives in this setting. Namely, we propose a realistic benchmark where negative query samples are drawn from the same original dataset as positive ones, including a granularity-controlled version of iNaturalist, where negative samples are at a fixed distance in the taxonomy tree from the positive ones. To our knowledge, we are the first to demonstrate the ability to discriminate a single category from other semantically related ones using only its label. △ Less

Submitted 27 May, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2402.16830 [pdf, other]

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning

Authors: Luca Zampierin, Ghouthi Boukli Hacene, Bac Nguyen, Mirco Ravanelli

Abstract: Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain… ▽ More Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted at the Self-supervision in Audio, Speech and Beyond (SASB) Workshop at ICASSP 2024

arXiv:2401.11311 [pdf, ps, other]

A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models

Authors: Reda Bensaid, Vincent Gripon, François Leduc-Primeau, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux

Abstract: Few-shot semantic segmentation (FSS) is a crucial challenge in computer vision, driving extensive research into a diverse range of methods, from advanced meta-learning techniques to simple transfer learning baselines. With the emergence of vision foundation models (VFM) serving as generalist feature extractors, we seek to explore the adaptation of these models for FSS. While current FSS benchmarks… ▽ More Few-shot semantic segmentation (FSS) is a crucial challenge in computer vision, driving extensive research into a diverse range of methods, from advanced meta-learning techniques to simple transfer learning baselines. With the emergence of vision foundation models (VFM) serving as generalist feature extractors, we seek to explore the adaptation of these models for FSS. While current FSS benchmarks focus on adapting pre-trained models to new tasks with few images, they emphasize in-domain generalization, making them less suitable for VFM trained on large-scale web datasets. To address this, we propose a novel realistic benchmark with a simple and straightforward adaptation process tailored for this task. Using this benchmark, we conduct a comprehensive comparative analysis of prominent VFM and semantic segmentation models. To evaluate their effectiveness, we leverage various adaption methods, ranging from linear probing to parameter efficient fine-tuning (PEFT) and full fine-tuning. Our findings show that models designed for segmentation can be outperformed by self-supervised (SSL) models. On the other hand, while PEFT methods yields competitive performance, they provide little discrepancy in the obtained results compared to other methods, highlighting the critical role of the feature extractor in determining results. To our knowledge, this is the first study on the adaptation of VFM for FSS. △ Less

Submitted 3 June, 2025; v1 submitted 20 January, 2024; originally announced January 2024.

arXiv:2311.14544 [pdf, other]

Inferring Latent Class Statistics from Text for Robust Visual Few-Shot Learning

Authors: Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Giulia Lioi, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene

Abstract: In the realm of few-shot learning, foundation models like CLIP have proven effective but exhibit limitations in cross-domain robustness especially in few-shot settings. Recent works add text as an extra modality to enhance the performance of these models. Most of these approaches treat text as an auxiliary modality without fully exploring its potential to elucidate the underlying class visual feat… ▽ More In the realm of few-shot learning, foundation models like CLIP have proven effective but exhibit limitations in cross-domain robustness especially in few-shot settings. Recent works add text as an extra modality to enhance the performance of these models. Most of these approaches treat text as an auxiliary modality without fully exploring its potential to elucidate the underlying class visual features distribution. In this paper, we present a novel approach that leverages text-derived statistics to predict the mean and covariance of the visual feature distribution for each class. This predictive framework enriches the latent space, yielding more robust and generalizable few-shot learning models. We demonstrate the efficacy of incorporating both mean and covariance statistics in improving few-shot classification performance across various datasets. Our method shows that we can use text to predict the mean and covariance of the distribution offering promising improvements in few-shot learning scenarios. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: R0-FoMo: Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models at NeurIPS 2023

arXiv:2309.12854 [pdf, ps, other]

ThinResNet: A New Baseline for Structured Convolutional Networks Pruning

Authors: Hugo Tessier, Ghouti Boukli Hacene, Vincent Gripon

Abstract: Pruning is a compression method which aims to improve the efficiency of neural networks by reducing their number of parameters while maintaining a good performance, thus enhancing the performance-to-cost ratio in nontrivial ways. Of particular interest are structured pruning techniques, in which whole portions of parameters are removed altogether, resulting in easier to leverage shrunk architectur… ▽ More Pruning is a compression method which aims to improve the efficiency of neural networks by reducing their number of parameters while maintaining a good performance, thus enhancing the performance-to-cost ratio in nontrivial ways. Of particular interest are structured pruning techniques, in which whole portions of parameters are removed altogether, resulting in easier to leverage shrunk architectures. Since its growth in popularity in the recent years, pruning gave birth to countless papers and contributions, resulting first in critical inconsistencies in the way results are compared, and then to a collective effort to establish standardized benchmarks. However, said benchmarks are based on training practices that date from several years ago and do not align with current practices. In this work, we verify how results in the recent literature of pruning hold up against networks that underwent both state-of-the-art training methods and trivial model scaling. We find that the latter clearly and utterly outperform all the literature we compared to, proving that updating standard pruning benchmarks and re-evaluating classical methods in their light is an absolute necessity. We thus introduce a new challenging baseline to compare structured pruning to: ThinResNet. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: 11 pages, 2 figures, 3 tables

arXiv:2212.06461 [pdf, ps, other]

A Statistical Model for Predicting Generalization in Few-Shot Classification

Authors: Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux, Ghouthi Boukli Hacene, Javier Alonso Garcia

Abstract: The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaus… ▽ More The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaussian model of the feature distribution. By estimating the parameters of this model, we are able to predict the generalization error on new classification tasks with few samples. We observe that accurate distance estimates between class-conditional densities are the key to accurate estimates of the generalization performance. Therefore, we propose an unbiased estimator for these distances and integrate it in our numerical analysis. We empirically show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy. △ Less

Submitted 28 March, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

arXiv:2105.13331 [pdf, other]

doi 10.3390/s21092984

Quantization and Deployment of Deep Neural Networks on Microcontrollers

Authors: Pierre-Emmanuel Novac, Ghouthi Boukli Hacene, Alain Pegatoquet, Benoît Miramond, Vincent Gripon

Abstract: Embedding Artificial Intelligence onto low-power devices is a challenging task that has been partly overcome with recent advances in machine learning and hardware design. Presently, deep neural networks can be deployed on embedded targets to perform different tasks such as speech recognition,object detection or Human Activity Recognition. However, there is still room for optimization of deep neura… ▽ More Embedding Artificial Intelligence onto low-power devices is a challenging task that has been partly overcome with recent advances in machine learning and hardware design. Presently, deep neural networks can be deployed on embedded targets to perform different tasks such as speech recognition,object detection or Human Activity Recognition. However, there is still room for optimization of deep neural networks onto embedded devices. These optimizations mainly address power consumption,memory and real-time constraints, but also an easier deployment at the edge. Moreover, there is still a need for a better understanding of what can be achieved for different use cases. This work focuses on quantization and deployment of deep neural networks onto low-power 32-bit microcontrollers. The quantization methods, relevant in the context of an embedded execution onto a microcontroller, are first outlined. Then, a new framework for end-to-end deep neural networks training, quantization and deployment is presented. This framework, called MicroAI, is designed as an alternative to existing inference engines (TensorFlow Lite for Microcontrollers and STM32CubeAI). Our framework can indeed be easily adjusted and/or extended for specific use cases. Execution using single precision 32-bit floating-point as well as fixed-point on 8- and 16-bit integers are supported. The proposed quantization method is evaluated with three different datasets (UCI-HAR, Spoken MNIST and GTSRB). Finally, a comparison study between MicroAI and both existing embedded inference engines is provided in terms of memory and power efficiency. On-device evaluation is done using ARM Cortex-M4F-based microcontrollers (Ambiq Apollo3 and STM32L452RE). △ Less

Submitted 23 September, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: 34 pages, 14 figures. Published in MDPI Sensors 2021, special issue "Embedded Artificial Intelligence (AI) for Smart Sensing and IoT Applications": https://www.mdpi.com/1424-8220/21/9/2984 . v2: add reference for MicroAI software and link to source code repository; fix Eq. 3 according to implementation; improve English grammar and spelling; improve page layout

Journal ref: Sensors 2021, 21, 2984

arXiv:2103.13322 [pdf, other]

DNN Quantization with Attention

Authors: Ghouthi Boukli Hacene, Lukas Mauch, Stefan Uhlich, Fabien Cardinaux

Abstract: Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop in accuracy, in particular when we apply it to complex learning tasks or lightweight DNN architectures. In this paper, we propose a training procedure that rel… ▽ More Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop in accuracy, in particular when we apply it to complex learning tasks or lightweight DNN architectures. In this paper, we propose a training procedure that relaxes the low-bit quantization. We call this procedure \textit{DNN Quantization with Attention} (DQA). The relaxation is achieved by using a learnable linear combination of high, medium and low-bit quantizations. Our learning procedure converges step by step to a low-bit quantization using an attention mechanism with temperature scheduling. In experiments, our approach outperforms other low-bit quantization techniques on various object recognition benchmarks such as CIFAR10, CIFAR100 and ImageNet ILSVRC 2012, achieves almost the same accuracy as a full precision DNN, and considerably reduces the accuracy drop when quantizing lightweight DNN architectures. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:2101.04073 [pdf, other]

Deeplite Neutrino: An End-to-End Framework for Constrained Deep Learning Model Optimization

Authors: Anush Sankaran, Olivier Mastropietro, Ehsan Saboori, Yasser Idris, Davis Sawyer, MohammadHossein AskariHemmat, Ghouthi Boukli Hacene

Abstract: Designing deep learning-based solutions is becoming a race for training deeper models with a greater number of layers. While a large-size deeper model could provide competitive accuracy, it creates a lot of logistical challenges and unreasonable resource requirements during development and deployment. This has been one of the key reasons for deep learning models not being excessively used in vario… ▽ More Designing deep learning-based solutions is becoming a race for training deeper models with a greater number of layers. While a large-size deeper model could provide competitive accuracy, it creates a lot of logistical challenges and unreasonable resource requirements during development and deployment. This has been one of the key reasons for deep learning models not being excessively used in various production environments, especially in edge devices. There is an immediate requirement for optimizing and compressing these deep learning models, to enable on-device intelligence. In this research, we introduce a black-box framework, Deeplite Neutrino for production-ready optimization of deep learning models. The framework provides an easy mechanism for the end-users to provide constraints such as a tolerable drop in accuracy or target size of the optimized models, to guide the whole optimization process. The framework is easy to include in an existing production pipeline and is available as a Python Package, supporting PyTorch and Tensorflow libraries. The optimization performance of the framework is shown across multiple benchmark datasets and popular deep learning models. Further, the framework is currently used in production and the results and testimonials from several clients are summarized. △ Less

Submitted 13 January, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: "IAAI Deployed Application Award", IAAI 2021 @ AAAI 2021

arXiv:2012.01509 [pdf, ps, other]

DecisiveNets: Training Deep Associative Memories to Solve Complex Machine Learning Problems

Authors: Vincent Gripon, Carlos Lassance, Ghouthi Boukli Hacene

Abstract: Learning deep representations to solve complex machine learning tasks has become the prominent trend in the past few years. Indeed, Deep Neural Networks are now the golden standard in domains as various as computer vision, natural language processing or even playing combinatorial games. However, problematic limitations are hidden behind this surprising universal capability. Among other things, exp… ▽ More Learning deep representations to solve complex machine learning tasks has become the prominent trend in the past few years. Indeed, Deep Neural Networks are now the golden standard in domains as various as computer vision, natural language processing or even playing combinatorial games. However, problematic limitations are hidden behind this surprising universal capability. Among other things, explainability of the decisions is a major concern, especially since deep neural networks are made up of a very large number of trainable parameters. Moreover, computational complexity can quickly become a problem, especially in contexts constrained by real time or limited resources. Therefore, understanding how information is stored and the impact this storage can have on the system remains a major and open issue. In this chapter, we introduce a method to transform deep neural network models into deep associative memories, with simpler, more explicable and less expensive operations. We show through experiments that these transformations can be done without penalty on predictive performance. The resulting deep associative memories are excellent candidates for artificial intelligence that is easier to theorize and manipulate. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: Submitted as chapter proposal

arXiv:2007.10106 [pdf, other]

ThriftyNets : Convolutional Neural Networks with Tiny Parameter Budget

Authors: Guillaume Coiffier, Ghouthi Boukli Hacene, Vincent Gripon

Abstract: Typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort t… ▽ More Typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40K parameters in total, and 74.3% on CIFAR-100 with less than 600K parameters. △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2002.03090 [pdf, other]

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Authors: Miloš Nikolić, Ghouthi Boukli Hacene, Ciaran Bannon, Alberto Delmas Lascorz, Matthieu Courbariaux, Yoshua Bengio, Vincent Gripon, Andreas Moshovos

Abstract: Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity whi… ▽ More Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity while maintaining accuracy. Namely, we propose a regularizer that penalizes large bitlength representations throughout the architecture and show how it can be modified to minimize other quantifiable criteria, such as number of operations or memory footprint. We demonstrate that our method learns thrifty representations while maintaining accuracy. With ImageNet, the method produces an average per layer bitlength of 4.13, 3.76 and 4.36 bits on AlexNet, ResNet18 and MobileNet V2 respectively, remaining within 2.0%, 0.5% and 0.5% of the base TOP-1 accuracy. △ Less

Submitted 11 August, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

arXiv:1911.10287 [pdf, ps, other]

Training Modern Deep Neural Networks for Memory-Fault Robustness

Authors: Ghouthi Boukli Hacene, François Leduc-Primeau, Amal Ben Soussia, Vincent Gripon, François Gagnon

Abstract: Because deep neural networks (DNNs) rely on a large number of parameters and computations, their implementation in energy-constrained systems is challenging. In this paper, we investigate the solution of reducing the supply voltage of the memories used in the system, which results in bit-cell faults. We explore the robustness of state-of-the-art DNN architectures towards such defects and propose a… ▽ More Because deep neural networks (DNNs) rely on a large number of parameters and computations, their implementation in energy-constrained systems is challenging. In this paper, we investigate the solution of reducing the supply voltage of the memories used in the system, which results in bit-cell faults. We explore the robustness of state-of-the-art DNN architectures towards such defects and propose a regularizer meant to mitigate their effects on accuracy. Our experiments clearly demonstrate the interest of operating the system in a faulty regime to save energy without reducing accuracy. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Journal ref: In 2019 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1-5). IEEE

arXiv:1911.07847 [pdf, ps, other]

Efficient Hardware Implementation of Incremental Learning and Inference on Chip

Authors: Ghouthi Boukli Hacene, Vincent Gripon, Nicolas Farrugia, Matthieu Arzel, Michel Jezequel

Abstract: In this paper, we tackle the problem of incrementally learning a classifier, one example at a time, directly on chip. To this end, we propose an efficient hardware implementation of a recently introduced incremental learning procedure that achieves state-of-the-art performance by combining transfer learning with majority votes and quantization techniques. The proposed design is able to accommodate… ▽ More In this paper, we tackle the problem of incrementally learning a classifier, one example at a time, directly on chip. To this end, we propose an efficient hardware implementation of a recently introduced incremental learning procedure that achieves state-of-the-art performance by combining transfer learning with majority votes and quantization techniques. The proposed design is able to accommodate for both new examples and new classes directly on the chip. We detail the hardware implementation of the method (implemented on FPGA target) and show it requires limited resources while providing a significant acceleration compared to using a CPU. △ Less

Submitted 17 November, 2019; originally announced November 2019.

Comments: In 2019 IEEE International NEWCAS Conference

arXiv:1911.03080 [pdf, ps, other]

Deep geometric knowledge distillation with graphs

Authors: Carlos Lassance, Myriam Bontonou, Ghouthi Boukli Hacene, Vincent Gripon, Jian Tang, Antonio Ortega

Abstract: In most cases deep learning architectures are trained disregarding the amount of operations and energy consumption. However, some applications, like embedded systems, can be resource-constrained during inference. A popular approach to reduce the size of a deep learning architecture consists in distilling knowledge from a bigger network (teacher) to a smaller one (student). Directly training the st… ▽ More In most cases deep learning architectures are trained disregarding the amount of operations and energy consumption. However, some applications, like embedded systems, can be resource-constrained during inference. A popular approach to reduce the size of a deep learning architecture consists in distilling knowledge from a bigger network (teacher) to a smaller one (student). Directly training the student to mimic the teacher representation can be effective, but it requires that both share the same latent space dimensions. In this work, we focus instead on relative knowledge distillation (RKD), which considers the geometry of the respective latent spaces, allowing for dimension-agnostic transfer of knowledge. Specifically we introduce a graph-based RKD method, in which graphs are used to capture the geometry of latent spaces. Using classical computer vision benchmarks, we demonstrate the ability of the proposed method to efficiently distillate knowledge from the teacher to the student, leading to better accuracy for the same budget as compared to existing RKD alternatives. △ Less

Submitted 8 November, 2019; originally announced November 2019.

arXiv:1905.12300 [pdf, ps, other]

Attention Based Pruning for Shift Networks

Authors: Ghouthi Boukli Hacene, Carlos Lassance, Vincent Gripon, Matthieu Courbariaux, Yoshua Bengio

Abstract: In many application domains such as computer vision, Convolutional Layers (CLs) are key to the accuracy of deep learning methods. However, it is often required to assemble a large number of CLs, each containing thousands of parameters, in order to reach state-of-the-art accuracy, thus resulting in complex and demanding systems that are poorly fitted to resource-limited devices. Recently, methods h… ▽ More In many application domains such as computer vision, Convolutional Layers (CLs) are key to the accuracy of deep learning methods. However, it is often required to assemble a large number of CLs, each containing thousands of parameters, in order to reach state-of-the-art accuracy, thus resulting in complex and demanding systems that are poorly fitted to resource-limited devices. Recently, methods have been proposed to replace the generic convolution operator by the combination of a shift operation and a simpler 1x1 convolution. The resulting block, called Shift Layer (SL), is an efficient alternative to CLs in the sense it allows to reach similar accuracies on various tasks with faster computations and fewer parameters. In this contribution, we introduce Shift Attention Layers (SALs), which extend SLs by using an attention mechanism that learns which shifts are the best at the same time the network function is trained. We demonstrate SALs are able to outperform vanilla SLs (and CLs) on various object recognition benchmarks while significantly reducing the number of float operations and parameters for the inference. △ Less

Submitted 29 May, 2019; originally announced May 2019.

arXiv:1905.00301 [pdf, ps, other]

Introducing Graph Smoothness Loss for Training Deep Learning Architectures

Authors: Myriam Bontonou, Carlos Lassance, Ghouthi Boukli Hacene, Vincent Gripon, Jian Tang, Antonio Ortega

Abstract: We introduce a novel loss function for training deep learning architectures to perform classification. It consists in minimizing the smoothness of label signals on similarity graphs built at the output of the architecture. Equivalently, it can be seen as maximizing the distances between the network function images of training inputs from distinct classes. As such, only distances between pairs of e… ▽ More We introduce a novel loss function for training deep learning architectures to perform classification. It consists in minimizing the smoothness of label signals on similarity graphs built at the output of the architecture. Equivalently, it can be seen as maximizing the distances between the network function images of training inputs from distinct classes. As such, only distances between pairs of examples in distinct classes are taken into account in the process, and the training does not prevent inputs from the same class to be mapped to distant locations in the output domain. We show that this loss leads to similar performance in classification as architectures trained using the classical cross-entropy, while offering interesting degrees of freedom and properties. We also demonstrate the interest of the proposed loss to increase robustness of trained architectures to deviations of the inputs. △ Less

Submitted 1 May, 2019; originally announced May 2019.

Comments: 5 pages

arXiv:1812.11337 [pdf, ps, other]

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Authors: Ghouthi Boukli Hacene, Vincent Gripon, Matthieu Arzel, Nicolas Farrugia, Yoshua Bengio

Abstract: Convolutional Neural Networks (CNNs) are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a new pruning technique and a quan… ▽ More Convolutional Neural Networks (CNNs) are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a new pruning technique and a quantization scheme that effectively reduce the complexity and memory usage of convolutional layers of CNNs, and replace the complex convolutional operation by a low-cost multiplexer. We perform experiments on the CIFAR10, CIFAR100 and SVHN and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprints. We also propose an efficient hardware architecture to accelerate CNN operations. The proposed hardware architecture is a pipeline and accommodates multiple layers working at the same time to speed up the inference process. △ Less

Submitted 29 December, 2018; originally announced December 2018.

arXiv:1810.02020 [pdf, ps, other]

Transfer Incremental Learning using Data Augmentation

Authors: Ghouthi Boukli Hacene, Vincent Gripon, Nicolas Farrugia, Matthieu Arzel, Michel Jezequel

Abstract: Deep learning-based methods have reached state of the art performances, relying on large quantity of available data and computational power. Such methods still remain highly inappropriate when facing a major open machine learning problem, which consists of learning incrementally new classes and examples over time. Combining the outstanding performances of Deep Neural Networks (DNNs) with the flexi… ▽ More Deep learning-based methods have reached state of the art performances, relying on large quantity of available data and computational power. Such methods still remain highly inappropriate when facing a major open machine learning problem, which consists of learning incrementally new classes and examples over time. Combining the outstanding performances of Deep Neural Networks (DNNs) with the flexibility of incremental learning techniques is a promising venue of research. In this contribution, we introduce Transfer Incremental Learning using Data Augmentation (TILDA). TILDA is based on pre-trained DNNs as feature extractor, robust selection of feature vectors in subspaces using a nearest-class-mean based technique, majority votes and data augmentation at both the training and the prediction stages. Experiments on challenging vision datasets demonstrate the ability of the proposed method for low complexity incremental learning, while achieving significantly better accuracy than existing incremental counterparts. △ Less

Submitted 3 October, 2018; originally announced October 2018.

arXiv:1710.08637 [pdf, ps, other]

Improving Accuracy of Nonparametric Transfer Learning via Vector Segmentation

Authors: Vincent Gripon, Ghouthi B. Hacene, Matthias Löwe, Franck Vermet

Abstract: Transfer learning using deep neural networks as feature extractors has become increasingly popular over the past few years. It allows to obtain state-of-the-art accuracy on datasets too small to train a deep neural network on its own, and it provides cutting edge descriptors that, combined with nonparametric learning methods, allow rapid and flexible deployment of performing solutions in computati… ▽ More Transfer learning using deep neural networks as feature extractors has become increasingly popular over the past few years. It allows to obtain state-of-the-art accuracy on datasets too small to train a deep neural network on its own, and it provides cutting edge descriptors that, combined with nonparametric learning methods, allow rapid and flexible deployment of performing solutions in computationally restricted settings. In this paper, we are interested in showing that the features extracted using deep neural networks have specific properties which can be used to improve accuracy of downstream nonparametric learning methods. Namely, we demonstrate that for some distributions where information is embedded in a few coordinates, segmenting feature vectors can lead to better accuracy. We show how this model can be applied to real datasets by performing experiments using three mainstream deep neural network feature extractors and four databases, in vision and audio. △ Less

Submitted 24 October, 2017; originally announced October 2017.

Showing 1–20 of 20 results for author: Hacene, G B