-
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
Authors:
Zehao Fan,
Garrett Gagnon,
Zhenyu Liu,
Liu Liu
Abstract:
Transformer-based models are the foundation of modern machine learning, but their execution, particularly during autoregressive decoding in large language models (LLMs), places significant pressure on memory systems due to frequent memory accesses and growing key-value (KV) caches. This creates a bottleneck in memory bandwidth, especially as context lengths increase. Processing-in-memory (PIM) arc…
▽ More
Transformer-based models are the foundation of modern machine learning, but their execution, particularly during autoregressive decoding in large language models (LLMs), places significant pressure on memory systems due to frequent memory accesses and growing key-value (KV) caches. This creates a bottleneck in memory bandwidth, especially as context lengths increase. Processing-in-memory (PIM) architectures are a promising solution, offering high internal bandwidth and compute parallelism near memory. However, current PIM designs are primarily optimized for dense attention and struggle with the dynamic, irregular access patterns introduced by modern KV cache sparsity techniques. Consequently, they suffer from workload imbalance, reducing throughput and resource utilization. In this work, we propose STARC, a novel sparsity-optimized data mapping scheme tailored specifically for efficient LLM decoding on PIM architectures. STARC clusters KV pairs by semantic similarity and maps them to contiguous memory regions aligned with PIM bank structures. During decoding, queries retrieve relevant tokens at cluster granularity by matching against precomputed centroids, enabling selective attention and parallel processing without frequent reclustering or data movement overhead. Experiments on the HBM-PIM system show that, compared to common token-wise sparsity methods, STARC reduces attention-layer latency by 19%--31% and energy consumption by 19%--27%. Under a KV cache budget of 1024, it achieves up to 54%--74% latency reduction and 45%--67% energy reduction compared to full KV cache retrieval. Meanwhile, STARC maintains model accuracy comparable to state-of-the-art sparse attention methods, demonstrating its effectiveness in enabling efficient and hardware-friendly long-context LLM inference on PIM architectures.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
WiP: Towards a Secure SECP256K1 for Crypto Wallets: Hardware Architecture and Implementation
Authors:
Joel Poncha Lemayian,
Ghyslain Gagnon,
Kaiwen Zhang,
Pascal Giard
Abstract:
The SECP256K1 elliptic curve algorithm is fundamental in cryptocurrency wallets for generating secure public keys from private keys, thereby ensuring the protection and ownership of blockchain-based digital assets. However, the literature highlights several successful side-channel attacks on hardware wallets that exploit SECP256K1 to extract private keys. This work proposes a novel hardware archit…
▽ More
The SECP256K1 elliptic curve algorithm is fundamental in cryptocurrency wallets for generating secure public keys from private keys, thereby ensuring the protection and ownership of blockchain-based digital assets. However, the literature highlights several successful side-channel attacks on hardware wallets that exploit SECP256K1 to extract private keys. This work proposes a novel hardware architecture for SECP256K1, optimized for side-channel attack resistance and efficient resource utilization. The architecture incorporates complete addition formulas, temporary registers, and parallel processing techniques, making elliptic curve point addition and doubling operations indistinguishable. Implementation results demonstrate an average reduction of 45% in LUT usage compared to similar works, emphasizing the design's resource efficiency.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Improvement Of Audiovisual Quality Estimation Using A Nonlinear Autoregressive Exogenous Neural Network And Bitstream Parameters
Authors:
Koffi Kossi,
Stephane Coulombe,
Christian Desrosiers,
Ghyslain Gagnon
Abstract:
With the increasing demand for audiovisual services, telecom service providers and application developers are compelled to ensure that their services provide the best possible user experience. Particularly, services such as videoconferencing are very sensitive to network conditions. Therefore, their performance should be monitored in real time in order to adjust parameters to any network perturbat…
▽ More
With the increasing demand for audiovisual services, telecom service providers and application developers are compelled to ensure that their services provide the best possible user experience. Particularly, services such as videoconferencing are very sensitive to network conditions. Therefore, their performance should be monitored in real time in order to adjust parameters to any network perturbation. In this paper, we developed a parametric model for estimating the perceived audiovisual quality in videoconference services. Our model is developed with the nonlinear autoregressive exogenous (NARX) recurrent neural network and estimates the perceived quality in terms of mean opinion score (MOS). We validate our model using the publicly available INRS bitstream audiovisual quality dataset. This dataset contains bitstream parameters such as loss per frame, bit rate and video duration. We compare the proposed model against state-of-the-art methods based on machine learning and show our model to outperform these methods in terms of mean square error (MSE=0.150) and Pearson correlation coefficient (R=0.931)
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Enhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential Neurons
Authors:
Zhenyu Liu,
Garrett Gagnon,
Swagath Venkataramani,
Liu Liu
Abstract:
Deep Neural Networks (DNNs) have revolutionized a wide range of industries, from healthcare and finance to automotive, by offering unparalleled capabilities in data analysis and decision-making. Despite their transforming impact, DNNs face two critical challenges: the vulnerability to adversarial attacks and the increasing computational costs associated with more complex and larger models. In this…
▽ More
Deep Neural Networks (DNNs) have revolutionized a wide range of industries, from healthcare and finance to automotive, by offering unparalleled capabilities in data analysis and decision-making. Despite their transforming impact, DNNs face two critical challenges: the vulnerability to adversarial attacks and the increasing computational costs associated with more complex and larger models. In this paper, we introduce an effective method designed to simultaneously enhance adversarial robustness and execution efficiency. Unlike prior studies that enhance robustness via uniformly injecting noise, we introduce a non-uniform noise injection algorithm, strategically applied at each DNN layer to disrupt adversarial perturbations introduced in attacks. By employing approximation techniques, our approach identifies and protects essential neurons while strategically introducing noise into non-essential neurons. Our experimental results demonstrate that our method successfully enhances both robustness and efficiency across several attack scenarios, model architectures, and datasets.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Energy Disaggregation using Variational Autoencoders
Authors:
Antoine Langevin,
Marc-André Carbonneau,
Mohamed Cheriet,
Ghyslain Gagnon
Abstract:
Non-intrusive load monitoring (NILM) is a technique that uses a single sensor to measure the total power consumption of a building. Using an energy disaggregation method, the consumption of individual appliances can be estimated from the aggregate measurement. Recent disaggregation algorithms have significantly improved the performance of NILM systems. However, the generalization capability of the…
▽ More
Non-intrusive load monitoring (NILM) is a technique that uses a single sensor to measure the total power consumption of a building. Using an energy disaggregation method, the consumption of individual appliances can be estimated from the aggregate measurement. Recent disaggregation algorithms have significantly improved the performance of NILM systems. However, the generalization capability of these methods to different houses as well as the disaggregation of multi-state appliances are still major challenges. In this paper we address these issues and propose an energy disaggregation approach based on the variational autoencoders framework. The probabilistic encoder makes this approach an efficient model for encoding information relevant to the reconstruction of the target appliance consumption. In particular, the proposed model accurately generates more complex load profiles, thus improving the power signal reconstruction of multi-state appliances. Moreover, its regularized latent space improves the generalization capabilities of the model across different houses. The proposed model is compared to state-of-the-art NILM approaches on the UK-DALE and REFIT datasets, and yields competitive results. The mean absolute error reduces by 18% on average across all appliances compared to the state-of-the-art. The F1-Score increases by more than 11%, showing improvements for the detection of the target appliance in the aggregate measurement.
△ Less
Submitted 19 July, 2021; v1 submitted 22 March, 2021;
originally announced March 2021.
-
Measuring Disentanglement: A Review of Metrics
Authors:
Marc-André Carbonneau,
Julian Zaidi,
Jonathan Boilard,
Ghyslain Gagnon
Abstract:
Learning to disentangle and represent factors of variation in data is an important problem in AI. While many advances have been made to learn these representations, it is still unclear how to quantify disentanglement. While several metrics exist, little is known on their implicit assumptions, what they truly measure, and their limits. In consequence, it is difficult to interpret results when compa…
▽ More
Learning to disentangle and represent factors of variation in data is an important problem in AI. While many advances have been made to learn these representations, it is still unclear how to quantify disentanglement. While several metrics exist, little is known on their implicit assumptions, what they truly measure, and their limits. In consequence, it is difficult to interpret results when comparing different representations. In this work, we survey supervised disentanglement metrics and thoroughly analyze them. We propose a new taxonomy in which all metrics fall into one of three families: intervention-based, predictor-based and information-based. We conduct extensive experiments in which we isolate properties of disentangled representations, allowing stratified comparison along several axes. From our experiment results and analysis, we provide insights on relations between disentangled representation properties. Finally, we share guidelines on how to measure disentanglement.
△ Less
Submitted 9 May, 2022; v1 submitted 16 December, 2020;
originally announced December 2020.
-
Multi-stage Jamming Attacks Detection using Deep Learning Combined with Kernelized Support Vector Machine in 5G Cloud Radio Access Networks
Authors:
Marouane Hachimi,
Georges Kaddoum,
Ghyslain Gagnon,
Poulmanogo Illy
Abstract:
In 5G networks, the Cloud Radio Access Network (C-RAN) is considered a promising future architecture in terms of minimizing energy consumption and allocating resources efficiently by providing real-time cloud infrastructures, cooperative radio, and centralized data processing. Recently, given their vulnerability to malicious attacks, the security of C-RAN networks has attracted significant attenti…
▽ More
In 5G networks, the Cloud Radio Access Network (C-RAN) is considered a promising future architecture in terms of minimizing energy consumption and allocating resources efficiently by providing real-time cloud infrastructures, cooperative radio, and centralized data processing. Recently, given their vulnerability to malicious attacks, the security of C-RAN networks has attracted significant attention. Among various anomaly-based intrusion detection techniques, the most promising one is the machine learning-based intrusion detection as it learns without human assistance and adjusts actions accordingly. In this direction, many solutions have been proposed, but they show either low accuracy in terms of attack classification or they offer just a single layer of attack detection. This research focuses on deploying a multi-stage machine learning-based intrusion detection (ML-IDS) in 5G C-RAN that can detect and classify four types of jamming attacks: constant jamming, random jamming, deceptive jamming, and reactive jamming. This deployment enhances security by minimizing the false negatives in C-RAN architectures. The experimental evaluation of the proposed solution is carried out using WSN-DS (Wireless Sensor Networks DataSet), which is a dedicated wireless dataset for intrusion detection. The final classification accuracy of attacks is 94.51\% with a 7.84\% false negative rate.
△ Less
Submitted 14 April, 2020; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Early Detection for Optimal-Latency Communications in Multi-Hop Links
Authors:
Diego Barragán Guerrero,
Minh Au,
Ghyslain Gagnon,
François Gagnon,
Pascal Giard
Abstract:
Modern wireless machine-to-machine-type communications aim to provide both ultra reliability and low latency, stringent requirements that appear to be mutually exclusive. From the noisy channel coding theorem, we know that reliable communications mandate transmission rates that are lower than the channel capacity. To guarantee arbitrarily-low error probability, this implies the use of messages who…
▽ More
Modern wireless machine-to-machine-type communications aim to provide both ultra reliability and low latency, stringent requirements that appear to be mutually exclusive. From the noisy channel coding theorem, we know that reliable communications mandate transmission rates that are lower than the channel capacity. To guarantee arbitrarily-low error probability, this implies the use of messages whose lengths tend to infinity. However, long messages are not suitable for low-latency communications. In this paper, we propose an early-detection scheme for wireless communications under a finite-blocklength regime that employs a sequential-test technique to reduce latency while maintaining reliability. We prove that our scheme leads to an average detection time smaller than the symbol duration. Furthermore, in multi-hop low-traffic or continuous-transmission links, we show that our scheme can reliably detect symbols before the end of their transmission, significantly reducing the latency, while keeping the error probability below a predefined threshold.
△ Less
Submitted 8 July, 2019; v1 submitted 4 July, 2019;
originally announced July 2019.
-
Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems
Authors:
Marc-André Carbonneau,
Eric Granger,
Ghyslain Gagnon
Abstract:
A growing number of applications, e.g. video surveillance and medical image analysis, require training recognition systems from large amounts of weakly annotated data while some targeted interactions with a domain expert are allowed to improve the training process. In such cases, active learning (AL) can reduce labeling costs for training a classifier by querying the expert to provide the labels o…
▽ More
A growing number of applications, e.g. video surveillance and medical image analysis, require training recognition systems from large amounts of weakly annotated data while some targeted interactions with a domain expert are allowed to improve the training process. In such cases, active learning (AL) can reduce labeling costs for training a classifier by querying the expert to provide the labels of most informative instances. This paper focuses on AL methods for instance classification problems in multiple instance learning (MIL), where data is arranged into sets, called bags, that are weakly labeled. Most AL methods focus on single instance learning problems. These methods are not suitable for MIL problems because they cannot account for the bag structure of data. In this paper, new methods for bag-level aggregation of instance informativeness are proposed for multiple instance active learning (MIAL). The \textit{aggregated informativeness} method identifies the most informative instances based on classifier uncertainty, and queries bags incorporating the most information. The other proposed method, called \textit{cluster-based aggregative sampling}, clusters data hierarchically in the instance space. The informativeness of instances is assessed by considering bag labels, inferred instance labels, and the proportion of labels that remain to be discovered in clusters. Both proposed methods significantly outperform reference methods in extensive experiments using benchmark data from several application domains. Results indicate that using an appropriate strategy to address MIAL problems yields a significant reduction in the number of queries needed to achieve the same level of performance as single instance AL methods.
△ Less
Submitted 6 October, 2017;
originally announced October 2017.
-
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Authors:
Marc-André Carbonneau,
Veronika Cheplygina,
Eric Granger,
Ghyslain Gagnon
Abstract:
Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document c…
▽ More
Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research.
△ Less
Submitted 10 December, 2016;
originally announced December 2016.
-
Feature Learning from Spectrograms for Assessment of Personality Traits
Authors:
Marc-André Carbonneau,
Eric Granger,
Yazid Attabi,
Ghyslain Gagnon
Abstract:
Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf toolboxes. To achieve high accuracy, numerous features are typically extracted using complex and highly parameterized algorithms. In this paper, a new method based on…
▽ More
Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf toolboxes. To achieve high accuracy, numerous features are typically extracted using complex and highly parameterized algorithms. In this paper, a new method based on feature learning and spectrogram analysis is proposed to simplify the feature extraction process while maintaining a high level of accuracy. The proposed method learns a dictionary of discriminant features from patches extracted in the spectrogram representations of training speech segments. Each speech segment is then encoded using the dictionary, and the resulting feature set is used to perform classification of personality traits. Experiments indicate that the proposed method achieves state-of-the-art results with a significant reduction in complexity when compared to the most recent reference methods. The number of features, and difficulties linked to the feature extraction process are greatly reduced as only one type of descriptors is used, for which the 6 parameters can be tuned automatically. In contrast, the simplest reference method uses 4 types of descriptors to which 6 functionals are applied, resulting in over 20 parameters to be tuned.
△ Less
Submitted 4 October, 2016;
originally announced October 2016.