-
Low-Latency Software Polar Encoders and Decoders for Short Blocklengths
Authors:
Mathieu Leonardon,
Mohammed El Houcine Ayoubi,
Adrien Cassagne,
Romain Tajan,
Camille Leroux
Abstract:
This paper presents our low-latency Polar code encoders and decoders developed for the 2025 International Symposium on Topics in Coding (ISTC 2025) contest, which challenges participants to implement the fastest possible channel code encoders and decoders in terms of average and maximum latency on a CPU target. Our solution is based on Polar codes with an Adaptive Successive Cancellation List (ASC…
▽ More
This paper presents our low-latency Polar code encoders and decoders developed for the 2025 International Symposium on Topics in Coding (ISTC 2025) contest, which challenges participants to implement the fastest possible channel code encoders and decoders in terms of average and maximum latency on a CPU target. Our solution is based on Polar codes with an Adaptive Successive Cancellation List (ASCL) decoder. We introduce a novel ASCL unrolled decoder generator. We conduct an extensive exploration of the design space, including code construction, CRC selection, and list size, to identify optimal trade-offs between signal-to-noise ratio and decoding time across various operating points. The considered operating points are frame error rates of 10^{-3} and 10^{-5}, information bit lengths of 64, 128, 256, and 512, and code rates of 1/4, 1/2, and 4/5. We also propose an optimized bit-packed encoder. All implementations of the encoders and decoders, along with the code construction and the unrolled decoders generator, are released as open source in the AFF3CT toolbox.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Event Classification of Accelerometer Data for Industrial Package Monitoring with Embedded Deep Learning
Authors:
Manon Renault,
Hamoud Younes,
Hugo Tessier,
Ronan Le Roy,
Bastien Pasdeloup,
Mathieu Léonardon
Abstract:
Package monitoring is an important topic in industrial applications, with significant implications for operational efficiency and ecological sustainability. In this study, we propose an approach that employs an embedded system, placed on reusable packages, to detect their state (on a Forklift, in a Truck, or in an undetermined location). We aim to design a system with a lifespan of several years,…
▽ More
Package monitoring is an important topic in industrial applications, with significant implications for operational efficiency and ecological sustainability. In this study, we propose an approach that employs an embedded system, placed on reusable packages, to detect their state (on a Forklift, in a Truck, or in an undetermined location). We aim to design a system with a lifespan of several years, corresponding to the lifespan of reusable packages. Our analysis demonstrates that maximizing device lifespan requires minimizing wake time. We propose a pipeline that includes data processing, training, and evaluation of the deep learning model designed for imbalanced, multiclass time series data collected from an embedded sensor. The method uses a one-dimensional Convolutional Neural Network architecture to classify accelerometer data from the IoT device. Before training, two data augmentation techniques are tested to solve the imbalance problem of the dataset: the Synthetic Minority Oversampling TEchnique and the ADAptive SYNthetic sampling approach. After training, compression techniques are implemented to have a small model size. On the considered twoclass problem, the methodology yields a precision of 94.54% for the first class and 95.83% for the second class, while compression techniques reduce the model size by a factor of four. The trained model is deployed on the IoT device, where it operates with a power consumption of 316 mW during inference.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Input Resolution Downsizing as a Compression Technique for Vision Deep Learning Systems
Authors:
Jeremy Morlier,
Mathieu Leonardon,
Vincent Gripon
Abstract:
Model compression is a critical area of research in deep learning, in particular in vision, driven by the need to lighten models memory or computational footprints. While numerous methods for model compression have been proposed, most focus on pruning, quantization, or knowledge distillation. In this work, we delve into an under-explored avenue: reducing the resolution of the input image as a comp…
▽ More
Model compression is a critical area of research in deep learning, in particular in vision, driven by the need to lighten models memory or computational footprints. While numerous methods for model compression have been proposed, most focus on pruning, quantization, or knowledge distillation. In this work, we delve into an under-explored avenue: reducing the resolution of the input image as a complementary approach to other types of compression. By systematically investigating the impact of input resolution reduction, on both tasks of classification and semantic segmentation, and on convnets and transformer-based architectures, we demonstrate that this strategy provides an interesting alternative for model compression. Our experimental results on standard benchmarks highlight the potential of this method, achieving competitive performance while significantly reducing computational and memory requirements. This study establishes input resolution reduction as a viable and promising direction in the broader landscape of model compression techniques for vision applications.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
FLoCoRA: Federated learning compression with low-rank adaptation
Authors:
Lucas Grativol Ribeiro,
Mathieu Leonardon,
Guillaume Muller,
Virginie Fresse,
Matthieu Arzel
Abstract:
Low-Rank Adaptation (LoRA) methods have gained popularity in efficient parameter fine-tuning of models containing hundreds of billions of parameters. In this work, instead, we demonstrate the application of LoRA methods to train small-vision models in Federated Learning (FL) from scratch. We first propose an aggregation-agnostic method to integrate LoRA within FL, named FLoCoRA, showing that the m…
▽ More
Low-Rank Adaptation (LoRA) methods have gained popularity in efficient parameter fine-tuning of models containing hundreds of billions of parameters. In this work, instead, we demonstrate the application of LoRA methods to train small-vision models in Federated Learning (FL) from scratch. We first propose an aggregation-agnostic method to integrate LoRA within FL, named FLoCoRA, showing that the method is capable of reducing communication costs by 4.8 times, while having less than 1% accuracy degradation, for a CIFAR-10 classification task with a ResNet-8. Next, we show that the same method can be extended with an affine quantization scheme, dividing the communication cost by 18.6 times, while comparing it with the standard method, with still less than 1% of accuracy loss, tested with on a ResNet-18 model. Our formulation represents a strong baseline for message size reduction, even when compared to conventional model compression works, while also reducing the training memory requirements due to the low-rank adaptation.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC
Authors:
Lucas Grativol Ribeiro,
Lubin Gauthier,
Mathieu Leonardon,
Jérémy Morlier,
Antoine Lavrard-Meyer,
Guillaume Muller,
Virginie Fresse,
Matthieu Arzel
Abstract:
This paper tackles the challenges of implementing few-shot learning on embedded systems, specifically FPGA SoCs, a vital approach for adapting to diverse classification tasks, especially when the costs of data acquisition or labeling prove to be prohibitively high. Our contributions encompass the development of an end-to-end open-source pipeline for a few-shot learning platform for object classifi…
▽ More
This paper tackles the challenges of implementing few-shot learning on embedded systems, specifically FPGA SoCs, a vital approach for adapting to diverse classification tasks, especially when the costs of data acquisition or labeling prove to be prohibitively high. Our contributions encompass the development of an end-to-end open-source pipeline for a few-shot learning platform for object classification on a FPGA SoCs. The pipeline is built on top of the Tensil open-source framework, facilitating the design, training, evaluation, and deployment of DNN backbones tailored for few-shot learning. Additionally, we showcase our work's potential by building and deploying a low-power, low-latency demonstrator trained on the MiniImageNet dataset with a dataflow architecture. The proposed system has a latency of 30 ms while consuming 6.2 W on the PYNQ-Z1 board.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Federated learning compression designed for lightweight communications
Authors:
Lucas Grativol Ribeiro,
Mathieu Leonardon,
Guillaume Muller,
Virginie Fresse,
Matthieu Arzel
Abstract:
Federated Learning (FL) is a promising distributed method for edge-level machine learning, particularly for privacysensitive applications such as those in military and medical domains, where client data cannot be shared or transferred to a cloud computing server. In many use-cases, communication cost is a major challenge in FL due to its natural intensive network usage. Client devices, such as sma…
▽ More
Federated Learning (FL) is a promising distributed method for edge-level machine learning, particularly for privacysensitive applications such as those in military and medical domains, where client data cannot be shared or transferred to a cloud computing server. In many use-cases, communication cost is a major challenge in FL due to its natural intensive network usage. Client devices, such as smartphones or Internet of Things (IoT) nodes, have limited resources in terms of energy, computation, and memory. To address these hardware constraints, lightweight models and compression techniques such as pruning and quantization are commonly adopted in centralised paradigms. In this paper, we investigate the impact of compression techniques on FL for a typical image classification task. Going further, we demonstrate that a straightforward method can compresses messages up to 50% while having less than 1% of accuracy loss, competing with state-of-the-art techniques.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables
Authors:
Darshan C. Ganji,
Saad Ashfaq,
Ehsan Saboori,
Sudhakar Sah,
Saptarshi Mitra,
MohammadHossein AskariHemmat,
Alexander Hoffman,
Ahmed Hassanien,
Mathieu Léonardon
Abstract:
A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve model accuracy that is comparable to full-precision floating-point baselines even with sub-byte quantization. However, it is extremely challenging to depl…
▽ More
A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve model accuracy that is comparable to full-precision floating-point baselines even with sub-byte quantization. However, it is extremely challenging to deploy these ultra low-bit quantized models on mainstream CPU devices because commodity SIMD (Single Instruction, Multiple Data) hardware typically supports no less than 8-bit precision. To overcome this limitation, we propose DeepGEMM, a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. The proposed method precomputes all possible products of weights and activations, stores them in a lookup table, and efficiently accesses them at inference time to avoid costly multiply-accumulate operations. Our 2-bit implementation outperforms corresponding 8-bit integer kernels in the QNNPACK framework by up to 1.74x on x86 platforms.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Energy Consumption Analysis of pruned Semantic Segmentation Networks on an Embedded GPU
Authors:
Hugo Tessier,
Vincent Gripon,
Mathieu Léonardon,
Matthieu Arzel,
David Bertrand,
Thomas Hannagan
Abstract:
Deep neural networks are the state of the art in many computer vision tasks. Their deployment in the context of autonomous vehicles is of particular interest, since their limitations in terms of energy consumption prohibit the use of very large networks, that typically reach the best performance. A common method to reduce the complexity of these architectures, without sacrificing accuracy, is to r…
▽ More
Deep neural networks are the state of the art in many computer vision tasks. Their deployment in the context of autonomous vehicles is of particular interest, since their limitations in terms of energy consumption prohibit the use of very large networks, that typically reach the best performance. A common method to reduce the complexity of these architectures, without sacrificing accuracy, is to rely on pruning, in which the least important portions are eliminated. There is a large literature on the subject, but interestingly few works have measured the actual impact of pruning on energy. In this work, we are interested in measuring it in the specific context of semantic segmentation for autonomous driving, using the Cityscapes dataset. To this end, we analyze the impact of recently proposed structured pruning methods when trained architectures are deployed on a Jetson Xavier embedded GPU.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Leveraging Structured Pruning of Convolutional Neural Networks
Authors:
Hugo Tessier,
Vincent Gripon,
Mathieu Léonardon,
Matthieu Arzel,
David Bertrand,
Thomas Hannagan
Abstract:
Structured pruning is a popular method to reduce the cost of convolutional neural networks, that are the state of the art in many computer vision tasks. However, depending on the architecture, pruning introduces dimensional discrepancies which prevent the actual reduction of pruned networks. To tackle this problem, we propose a method that is able to take any structured pruning mask and generate a…
▽ More
Structured pruning is a popular method to reduce the cost of convolutional neural networks, that are the state of the art in many computer vision tasks. However, depending on the architecture, pruning introduces dimensional discrepancies which prevent the actual reduction of pruned networks. To tackle this problem, we propose a method that is able to take any structured pruning mask and generate a network that does not encounter any of these problems and can be leveraged efficiently. We provide an accurate description of our solution and show results of gains, in energy consumption and inference time on embedded hardware, of pruned convolutional neural networks.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Using Deep Neural Networks to Predict and Improve the Performance of Polar Codes
Authors:
Mathieu Léonardon,
Vincent Gripon
Abstract:
Polar codes can theoretically achieve very competitive Frame Error Rates. In practice, their performance may depend on the chosen decoding procedure, as well as other parameters of the communication system they are deployed upon. As a consequence, designing efficient polar codes for a specific context can quickly become challenging. In this paper, we introduce a methodology that consists in traini…
▽ More
Polar codes can theoretically achieve very competitive Frame Error Rates. In practice, their performance may depend on the chosen decoding procedure, as well as other parameters of the communication system they are deployed upon. As a consequence, designing efficient polar codes for a specific context can quickly become challenging. In this paper, we introduce a methodology that consists in training deep neural networks to predict the frame error rate of polar codes based on their frozen bit construction sequence. We introduce an algorithm based on Projected Gradient Descent that leverages the gradient of the neural network function to generate promising frozen bit sequences. We showcase on generated datasets the ability of the proposed methodology to produce codes more efficient than those used to train the neural networks, even when the latter are selected among the most efficient ones.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
Rethinking Weight Decay For Efficient Neural Network Pruning
Authors:
Hugo Tessier,
Vincent Gripon,
Mathieu Léonardon,
Matthieu Arzel,
Thomas Hannagan,
David Bertrand
Abstract:
Introduced in the late 1980s for generalization purposes, pruning has now become a staple for compressing deep neural networks. Despite many innovations in recent decades, pruning approaches still face core issues that hinder their performance or scalability. Drawing inspiration from early work in the field, and especially the use of weight decay to achieve sparsity, we introduce Selective Weight…
▽ More
Introduced in the late 1980s for generalization purposes, pruning has now become a staple for compressing deep neural networks. Despite many innovations in recent decades, pruning approaches still face core issues that hinder their performance or scalability. Drawing inspiration from early work in the field, and especially the use of weight decay to achieve sparsity, we introduce Selective Weight Decay (SWD), which carries out efficient, continuous pruning throughout training. Our approach, theoretically grounded on Lagrangian smoothing, is versatile and can be applied to multiple tasks, networks, and pruning structures. We show that SWD compares favorably to state-of-the-art approaches, in terms of performance-to-parameters ratio, on the CIFAR-10, Cora, and ImageNet ILSVRC2012 datasets.
△ Less
Submitted 9 March, 2022; v1 submitted 20 November, 2020;
originally announced November 2020.
-
Fast and Flexible Software Polar List Decoders
Authors:
Mathieu Léonardon,
Adrien Cassagne,
Camille Leroux,
Christophe Jégo,
Louis-Philippe Hamelin,
Yvon Savaria
Abstract:
Flexibility is one mandatory aspect of channel coding in modern wireless communication systems. Among other things, the channel decoder has to support several code lengths and code rates. This need for flexibility applies to polar codes that are considered for control channels in the future 5G standard. This paper presents a new generic and flexible implementation of a software Successive Cancella…
▽ More
Flexibility is one mandatory aspect of channel coding in modern wireless communication systems. Among other things, the channel decoder has to support several code lengths and code rates. This need for flexibility applies to polar codes that are considered for control channels in the future 5G standard. This paper presents a new generic and flexible implementation of a software Successive Cancellation List (SCL) decoder. A large set of parameters can be fine-tuned dynamically without re-compiling the software source code: the code length, the code rate, the frozen bits set, the puncturing patterns, the cyclic redundancy check, the list size, the type of decoding algorithm, the tree-pruning strategy and the data quantization. This generic and flexible SCL decoder enables to explore tradeoffs between throughput, latency and decoding performance. Several optimizations are proposed to achieve a competitive decoding speed despite the constraints induced by the genericity and the flexibility. The resulting polar list decoder is about 4 times faster than a generic software decoder and only 2 times slower than a non-flexible unrolled decoder. Thanks to the flexibility of the decoder, the fully adaptive SCL algorithm can be easily implemented and achieves higher throughput than any other similar decoder in the literature (up to 425 Mb/s on a single processor core for N = 2048 and K = 1723 at 4.5 dB).
△ Less
Submitted 23 October, 2017;
originally announced October 2017.