-
Online Gesture Recognition using Transformer and Natural Language Processing
Authors:
G. C. M. Silvestre,
F. Balado,
O. Akinremi,
M. Ramo
Abstract:
The Transformer architecture is shown to provide a powerful machine transduction framework for online handwritten gestures corresponding to glyph strokes of natural language sentences. The attention mechanism is successfully used to create latent representations of an end-to-end encoder-decoder model, solving multi-level segmentation while also learning some language features and syntax rules. The…
▽ More
The Transformer architecture is shown to provide a powerful machine transduction framework for online handwritten gestures corresponding to glyph strokes of natural language sentences. The attention mechanism is successfully used to create latent representations of an end-to-end encoder-decoder model, solving multi-level segmentation while also learning some language features and syntax rules. The additional use of a large decoding space with some learned Byte-Pair-Encoding (BPE) is shown to provide robustness to ablated inputs and syntax rules. The encoder stack was directly fed with spatio-temporal data tokens potentially forming an infinitely large input vocabulary, an approach that finds applications beyond that of this work. Encoder transfer learning capabilities is also demonstrated on several languages resulting in faster optimisation and shared parameters. A new supervised dataset of online handwriting gestures suitable for generic handwriting recognition tasks was used to successfully train a small transformer model to an average normalised Levenshtein accuracy of 96% on English or German sentences and 94% in French.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference
Authors:
Manuel Le Gallo,
Riduan Khaddam-Aljameh,
Milos Stanisavljevic,
Athanasios Vasilopoulos,
Benedikt Kersting,
Martino Dazzi,
Geethan Karunaratne,
Matthias Braendli,
Abhairaj Singh,
Silvia M. Mueller,
Julian Buechel,
Xavier Timoneda,
Vinay Joshi,
Urs Egger,
Angelo Garofalo,
Anastasios Petropoulos,
Theodore Antonakopoulos,
Kevin Brew,
Samuel Choi,
Injo Ok,
Timothy Philip,
Victor Chan,
Claire Silvestre,
Ishtiaq Ahsan,
Nicole Saulnier
, et al. (4 additional authors not shown)
Abstract:
The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly withi…
▽ More
The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly within the network weights stored on a chip to execute an inference workload. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and communication to move towards configurations in which a full inference workload is realized entirely on-chip. Moreover, it is highly desirable to achieve high MVM and inference accuracy without application-wise re-tuning of the chip. Here, we present a multi-core AIMC chip designed and fabricated in 14-nm complementary metal-oxide-semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM). The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. It also implements the digital activation functions and processing involved in ResNet convolutional neural networks and long short-term memory (LSTM) networks. We demonstrate near software-equivalent inference accuracy with ResNet and LSTM networks while implementing all the computations associated with the weight layers and the activation functions on-chip. The chip can achieve a maximal throughput of 63.1 TOPS at an energy efficiency of 9.76 TOPS/W for 8-bit input/output matrix-vector multiplications.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
A Transformer Architecture for Online Gesture Recognition of Mathematical Expressions
Authors:
Mirco Ramo,
Guénolé C. M. Silvestre
Abstract:
The Transformer architecture is shown to provide a powerful framework as an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes. In particular, the attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions creating latent representations that are correctly decoded to the exact mathematical…
▽ More
The Transformer architecture is shown to provide a powerful framework as an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes. In particular, the attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions creating latent representations that are correctly decoded to the exact mathematical expression tree, providing robustness to ablated inputs and unseen glyphs. For the first time, the encoder is fed with spatio-temporal data tokens potentially forming an infinitely large vocabulary, which finds applications beyond that of online gesture recognition. A new supervised dataset of online handwriting gestures is provided for training models on generic handwriting recognition tasks and a new metric is proposed for the evaluation of the syntactic correctness of the output expression trees. A small Transformer model suitable for edge inference was successfully trained to an average normalised Levenshtein accuracy of 94%, resulting in valid postfix RPN tree representation for 94% of predictions.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
In-memory Realization of In-situ Few-shot Continual Learning with a Dynamically Evolving Explicit Memory
Authors:
Geethan Karunaratne,
Michael Hersche,
Jovin Langenegger,
Giovanni Cherubini,
Manuel Le Gallo-Bourdeau,
Urs Egger,
Kevin Brew,
Sam Choi,
INJO OK,
Mary Claire Silvestre,
Ning Li,
Nicole Saulnier,
Victor Chan,
Ishtiaq Ahsan,
Vijay Narayanan,
Luca Benini,
Abu Sebastian,
Abbas Rahimi
Abstract:
Continually learning new classes from a few training examples without forgetting previous old classes demands a flexible architecture with an inevitably growing portion of storage, in which new examples and classes can be incrementally stored and efficiently retrieved. One viable architectural solution is to tightly couple a stationary deep neural network to a dynamically evolving explicit memory…
▽ More
Continually learning new classes from a few training examples without forgetting previous old classes demands a flexible architecture with an inevitably growing portion of storage, in which new examples and classes can be incrementally stored and efficiently retrieved. One viable architectural solution is to tightly couple a stationary deep neural network to a dynamically evolving explicit memory (EM). As the centerpiece of this architecture, we propose an EM unit that leverages energy-efficient in-memory compute (IMC) cores during the course of continual learning operations. We demonstrate for the first time how the EM unit can physically superpose multiple training examples, expand to accommodate unseen classes, and perform similarity search during inference, using operations on an IMC core based on phase-change memory (PCM). Specifically, the physical superposition of a few encoded training examples is realized via in-situ progressive crystallization of PCM devices. The classification accuracy achieved on the IMC core remains within a range of 1.28%--2.5% compared to that of the state-of-the-art full-precision baseline software model on both the CIFAR-100 and miniImageNet datasets when continually learning 40 novel classes (from only five examples per class) on top of 60 old classes.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
A Discrete-time Reputation-based Resilient Consensus Algorithm for Synchronous or Asynchronous Communications
Authors:
Guilherme Ramos,
Daniel Silvestre,
Carlos Silvestre
Abstract:
We tackle the problem of a set of agents achieving resilient consensus in the presence of attacked agents. We present a discrete-time reputation-based consensus algorithm for synchronous and asynchronous networks by developing a local strategy where, at each time, each agent assigns a reputation (between zero and one) to each neighbor. The reputation is then used to weigh the neighbors' values in…
▽ More
We tackle the problem of a set of agents achieving resilient consensus in the presence of attacked agents. We present a discrete-time reputation-based consensus algorithm for synchronous and asynchronous networks by developing a local strategy where, at each time, each agent assigns a reputation (between zero and one) to each neighbor. The reputation is then used to weigh the neighbors' values in the update of its state. Under mild assumptions, we show that: (i) the proposed method converges exponentially to the consensus of the regular agents; (ii) if a regular agent identifies a neighbor as an attacked node, then it is indeed an attacked node; (iii) if the consensus value of the normal nodes differs from that of any of the attacked nodes' values, then the reputation that a regular agent assigns to the attacked neighbors goes to zero. Further, we extend our method to achieve resilience in the scenarios where there are noisy nodes, dynamic networks and stochastic node selection. Finally, we illustrate our algorithm with several examples, and we delineate some attacking scenarios that can be dealt by the current proposal but not by the state-of-the-art approaches.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Open Source Dataset and Deep Learning Models for Online Digit Gesture Recognition on Touchscreens
Authors:
Philip J. Corr,
Guenole C. Silvestre,
Chris J. Bleakley
Abstract:
This paper presents an evaluation of deep neural networks for recognition of digits entered by users on a smartphone touchscreen. A new large dataset of Arabic numerals was collected for training and evaluation of the network. The dataset consists of spatial and temporal touch data recorded for 80 digits entered by 260 users. Two neural network models were investigated. The first model was a 2D co…
▽ More
This paper presents an evaluation of deep neural networks for recognition of digits entered by users on a smartphone touchscreen. A new large dataset of Arabic numerals was collected for training and evaluation of the network. The dataset consists of spatial and temporal touch data recorded for 80 digits entered by 260 users. Two neural network models were investigated. The first model was a 2D convolutional neural (ConvNet) network applied to bitmaps of the glpyhs created by interpolation of the sensed screen touches and its topology is similar to that of previously published models for offline handwriting recognition from scanned images. The second model used a 1D ConvNet architecture but was applied to the sequence of polar vectors connecting the touch points. The models were found to provide accuracies of 98.50% and 95.86%, respectively. The second model was much simpler, providing a reduction in the number of parameters from 1,663,370 to 287,690. The dataset has been made available to the community as an open source resource.
△ Less
Submitted 20 September, 2017;
originally announced September 2017.
-
Automated Identification of Trampoline Skills Using Computer Vision Extracted Pose Estimation
Authors:
Paul W. Connolly,
Guenole C. Silvestre,
Chris J. Bleakley
Abstract:
A novel method to identify trampoline skills using a single video camera is proposed herein. Conventional computer vision techniques are used for identification, estimation, and tracking of the gymnast's body in a video recording of the routine. For each frame, an open source convolutional neural network is used to estimate the pose of the athlete's body. Body orientation and joint angle estimates…
▽ More
A novel method to identify trampoline skills using a single video camera is proposed herein. Conventional computer vision techniques are used for identification, estimation, and tracking of the gymnast's body in a video recording of the routine. For each frame, an open source convolutional neural network is used to estimate the pose of the athlete's body. Body orientation and joint angle estimates are extracted from these pose estimates. The trajectories of these angle estimates over time are compared with those of labelled reference skills. A nearest neighbour classifier utilising a mean squared error distance metric is used to identify the skill performed. A dataset containing 714 skill examples with 20 distinct skills performed by adult male and female gymnasts was recorded and used for evaluation of the system. The system was found to achieve a skill identification accuracy of 80.7% for the dataset.
△ Less
Submitted 11 September, 2017;
originally announced September 2017.
-
LiDAR-based Control of Autonomous Rotorcraft for the Inspection of Pier-like Structures: Proofs
Authors:
Bruno J. Guerreiro,
Carlos Silvestre,
Rita Cunha,
David Cabecinhas
Abstract:
This is a complementary document to the paper presented in [1], to provide more detailed proofs for some results. The main paper addresses the problem of trajectory tracking control of autonomous rotorcraft in operation scenarios where only relative position measurements obtained from LiDAR sensors are possible. The proposed approach defines an alternative kinematic model, directly based on LiDAR…
▽ More
This is a complementary document to the paper presented in [1], to provide more detailed proofs for some results. The main paper addresses the problem of trajectory tracking control of autonomous rotorcraft in operation scenarios where only relative position measurements obtained from LiDAR sensors are possible. The proposed approach defines an alternative kinematic model, directly based on LiDAR measurements, and uses a trajectory-dependent error space to express the dynamic model of the vehicle. An LPV representation with piecewise affine dependence on the parameters is adopted to describe the error dynamics over a set of predefined operating regions, and a continuous-time $H_2$ control problem is solved using LMIs and implemented within the scope of gain-scheduling control theory.
△ Less
Submitted 3 May, 2017;
originally announced May 2017.