Search | arXiv e-print repository

Efficient FPGA Implementation of Time-Domain Popcount for Low-Complexity Machine Learning

Authors: Shengyu Duan, Marcos L. L. Sartori, Rishad Shafik, Alex Yakovlev, Emre Ozer

Abstract: Population count (popcount) is a crucial operation for many low-complexity machine learning (ML) algorithms, including Tsetlin Machine (TM)-a promising new ML method, particularly well-suited for solving classification tasks. The inference mechanism in TM consists of propositional logic-based structures within each class, followed by a majority voting scheme, which makes the classification decisio… ▽ More Population count (popcount) is a crucial operation for many low-complexity machine learning (ML) algorithms, including Tsetlin Machine (TM)-a promising new ML method, particularly well-suited for solving classification tasks. The inference mechanism in TM consists of propositional logic-based structures within each class, followed by a majority voting scheme, which makes the classification decision. In TM, the voters are the outputs of Boolean clauses. The voting mechanism comprises two operations: popcount for each class and determining the class with the maximum vote by means of an argmax operation. While TMs offer a lightweight ML alternative, their performance is often limited by the high computational cost of popcount and comparison required to produce the argmax result. In this paper, we propose an innovative approach to accelerate and optimize these operations by performing them in the time domain. Our time-domain implementation uses programmable delay lines (PDLs) and arbiters to efficiently manage these tasks through delay-based mechanisms. We also present an FPGA design flow for practical implementation of the time-domain popcount, addressing delay skew and ensuring that the behavior matches that of the model's intended functionality. By leveraging the natural compatibility of the proposed popcount with asynchronous architectures, we demonstrate significant improvements in an asynchronous TM, including up to 38% reduction in latency, 43.1% reduction in dynamic power, and 15% savings in resource utilization, compared to synchronous TMs using adder-based popcount. △ Less

Submitted 4 May, 2025; originally announced May 2025.

arXiv:2504.19797 [pdf, other]

Dynamic Tsetlin Machine Accelerators for On-Chip Training at the Edge using FPGAs

Authors: Gang Mao, Tousif Rahman, Sidharth Maheshwari, Bob Pattison, Zhuang Shao, Rishad Shafik, Alex Yakovlev

Abstract: The increased demand for data privacy and security in machine learning (ML) applications has put impetus on effective edge training on Internet-of-Things (IoT) nodes. Edge training aims to leverage speed, energy efficiency and adaptability within the resource constraints of the nodes. Deploying and training Deep Neural Networks (DNNs)-based models at the edge, although accurate, posit significant… ▽ More The increased demand for data privacy and security in machine learning (ML) applications has put impetus on effective edge training on Internet-of-Things (IoT) nodes. Edge training aims to leverage speed, energy efficiency and adaptability within the resource constraints of the nodes. Deploying and training Deep Neural Networks (DNNs)-based models at the edge, although accurate, posit significant challenges from the back-propagation algorithm's complexity, bit precision trade-offs, and heterogeneity of DNN layers. This paper presents a Dynamic Tsetlin Machine (DTM) training accelerator as an alternative to DNN implementations. DTM utilizes logic-based on-chip inference with finite-state automata-driven learning within the same Field Programmable Gate Array (FPGA) package. Underpinned on the Vanilla and Coalesced Tsetlin Machine algorithms, the dynamic aspect of the accelerator design allows for a run-time reconfiguration targeting different datasets, model architectures, and model sizes without resynthesis. This makes the DTM suitable for targeting multivariate sensor-based edge tasks. Compared to DNNs, DTM trains with fewer multiply-accumulates, devoid of derivative computation. It is a data-centric ML algorithm that learns by aligning Tsetlin automata with input data to form logical propositions enabling efficient Look-up-Table (LUT) mapping and frugal Block RAM usage in FPGA training implementations. The proposed accelerator offers 2.54x more Giga operations per second per Watt (GOP/s per W) and uses 6x less power than the next-best comparable design. △ Less

Submitted 28 April, 2025; originally announced April 2025.

arXiv:2502.07823 [pdf, other]

Runtime Tunable Tsetlin Machines for Edge Inference on eFPGAs

Authors: Tousif Rahman, Gang Mao, Bob Pattison, Sidharth Maheshwari, Marcos Sartori, Adrian Wheeldon, Rishad Shafik, Alex Yakovlev

Abstract: Embedded Field-Programmable Gate Arrays (eFPGAs) allow for the design of hardware accelerators of edge Machine Learning (ML) applications at a lower power budget compared with traditional FPGA platforms. However, the limited eFPGA logic and memory significantly constrain compute capabilities and model size. As such, ML application deployment on eFPGAs is in direct contrast with the most recent FPG… ▽ More Embedded Field-Programmable Gate Arrays (eFPGAs) allow for the design of hardware accelerators of edge Machine Learning (ML) applications at a lower power budget compared with traditional FPGA platforms. However, the limited eFPGA logic and memory significantly constrain compute capabilities and model size. As such, ML application deployment on eFPGAs is in direct contrast with the most recent FPGA approaches developing architecture-specific implementations and maximizing throughput over resource frugality. This paper focuses on the opposite side of this trade-off: the proposed eFPGA accelerator focuses on minimizing resource usage and allowing flexibility for on-field recalibration over throughput. This allows for runtime changes in model size, architecture, and input data dimensionality without offline resynthesis. This is made possible through the use of a bitwise compressed inference architecture of the Tsetlin Machine (TM) algorithm. TM compute does not require any multiplication operations, being limited to only bitwise AND, OR, NOT, summations and additions. Additionally, TM model compression allows the entire model to fit within the on-chip block RAM of the eFPGA. The paper uses this accelerator to propose a strategy for runtime model tuning in the field. The proposed approach uses 2.5x fewer Look-up-Tables (LUTs) and 3.38x fewer registers than the current most resource-fugal design and achieves up to 129x energy reduction compared with low-power microcontrollers running the same ML application. △ Less

Submitted 10 February, 2025; originally announced February 2025.

Comments: Accepted as a full paper by the 2025 EDGE AI FOUNDATION Austin

arXiv:2502.05640 [pdf, other]

ETHEREAL: Energy-efficient and High-throughput Inference using Compressed Tsetlin Machine

Authors: Shengyu Duan, Rishad Shafik, Alex Yakovlev

Abstract: The Tsetlin Machine (TM) is a novel alternative to deep neural networks (DNNs). Unlike DNNs, which rely on multi-path arithmetic operations, a TM learns propositional logic patterns from data literals using Tsetlin automata. This fundamental shift from arithmetic to logic underpinning makes TM suitable for empowering new applications with low-cost implementations. In TM, literals are often inclu… ▽ More The Tsetlin Machine (TM) is a novel alternative to deep neural networks (DNNs). Unlike DNNs, which rely on multi-path arithmetic operations, a TM learns propositional logic patterns from data literals using Tsetlin automata. This fundamental shift from arithmetic to logic underpinning makes TM suitable for empowering new applications with low-cost implementations. In TM, literals are often included by both positive and negative clauses within the same class, canceling out their impact on individual class definitions. This property can be exploited to develop compressed TM models, enabling energy-efficient and high-throughput inferences for machine learning (ML) applications. We introduce a training approach that incorporates excluded automata states to sparsify TM logic patterns in both positive and negative clauses. This exclusion is iterative, ensuring that highly class-correlated (and therefore significant) literals are retained in the compressed inference model, ETHEREAL, to maintain strong classification accuracy. Compared to standard TMs, ETHEREAL TM models can reduce model size by up to 87.54%, with only a minor accuracy compromise. We validate the impact of this compression on eight real-world Tiny machine learning (TinyML) datasets against standard TM, equivalent Random Forest (RF) and Binarized Neural Network (BNN) on the STM32F746G-DISCO platform. Our results show that ETHEREAL TM models achieve over an order of magnitude reduction in inference time (resulting in higher throughput) and energy consumption compared to BNNs, while maintaining a significantly smaller memory footprint compared to RFs. △ Less

Submitted 8 February, 2025; originally announced February 2025.

Comments: Accepted as a full paper by the 2025 EDGE AI FOUNDATION Austin

arXiv:2501.19347 [pdf, other]

An All-digital 8.6-nJ/Frame 65-nm Tsetlin Machine Image Classification Accelerator

Authors: Svein Anders Tunheim, Yujin Zheng, Lei Jiao, Rishad Shafik, Alex Yakovlev, Ole-Christoffer Granmo

Abstract: We present an all-digital programmable machine learning accelerator chip for image classification, underpinning on the Tsetlin machine (TM) principles. The TM is an emerging machine learning algorithm founded on propositional logic, utilizing sub-pattern recognition expressions called clauses. The accelerator implements the coalesced TM version with convolution, and classifies booleanized images o… ▽ More We present an all-digital programmable machine learning accelerator chip for image classification, underpinning on the Tsetlin machine (TM) principles. The TM is an emerging machine learning algorithm founded on propositional logic, utilizing sub-pattern recognition expressions called clauses. The accelerator implements the coalesced TM version with convolution, and classifies booleanized images of 28$\times$28 pixels with 10 categories. A configuration with 128 clauses is used in a highly parallel architecture. Fast clause evaluation is achieved by keeping all clause weights and Tsetlin automata (TA) action signals in registers. The chip is implemented in a 65 nm low-leakage CMOS technology, and occupies an active area of 2.7 mm$^2$. At a clock frequency of 27.8 MHz, the accelerator achieves 60.3k classifications per second, and consumes 8.6 nJ per classification. This demonstrates the energy-efficiency of the TM, which was the main motivation for developing this chip. The latency for classifying a single image is 25.4 $μ$s which includes system timing overhead. The accelerator achieves 97.42%, 84.54% and 82.55% test accuracies for the datasets MNIST, Fashion-MNIST and Kuzushiji-MNIST, respectively, matching the TM software models. △ Less

Submitted 2 April, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

Comments: This work has been submitted to the IEEE for possible publication

ACM Class: B.7

arXiv:2412.05327 [pdf, other]

IMPACT:InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference

Authors: Omar Ghazal, Wei Wang, Shahar Kvatinsky, Farhad Merchant, Alex Yakovlev, Rishad Shafik

Abstract: The increasing demand for processing large volumes of data for machine learning models has pushed data bandwidth requirements beyond the capability of traditional von Neumann architecture. In-memory computing (IMC) has recently emerged as a promising solution to address this gap by enabling distributed data storage and processing at the micro-architectural level, significantly reducing both latenc… ▽ More The increasing demand for processing large volumes of data for machine learning models has pushed data bandwidth requirements beyond the capability of traditional von Neumann architecture. In-memory computing (IMC) has recently emerged as a promising solution to address this gap by enabling distributed data storage and processing at the micro-architectural level, significantly reducing both latency and energy. In this paper, we present the IMPACT: InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference, underpinned on a cutting-edge memory device, Y-Flash, fabricated on a 180 nm CMOS process. Y-Flash devices have recently been demonstrated for digital and analog memory applications, offering high yield, non-volatility, and low power consumption. The IMPACT leverages the Y-Flash array to implement the inference of a novel machine learning algorithm: coalesced Tsetlin machine (CoTM) based on propositional logic. CoTM utilizes Tsetlin automata (TA) to create Boolean feature selections stochastically across parallel clauses. The IMPACT is organized into two computational crossbars for storing the TA and weights. Through validation on the MNIST dataset, IMPACT achieved 96.3% accuracy. The IMPACT demonstrated improvements in energy efficiency, e.g., 2.23X over CNN-based ReRAM, 2.46X over Neuromorphic using NOR-Flash, and 2.06X over DNN-based PCM, suited for modern ML inference applications. △ Less

Submitted 4 December, 2024; originally announced December 2024.

Comments: 27 Pages, 14 Figures, 6 Tables

arXiv:2408.09456 [pdf, other]

In-Memory Learning Automata Architecture using Y-Flash Cell

Authors: Omar Ghazal, Tian Lan, Shalman Ojukwu, Komal Krishnamurthy, Alex Yakovlev, Rishad Shafik

Abstract: The modern implementation of machine learning architectures faces significant challenges due to frequent data transfer between memory and processing units. In-memory computing, primarily through memristor-based analog computing, offers a promising solution to overcome this von Neumann bottleneck. In this technology, data processing and storage are located inside the memory. Here, we introduce a no… ▽ More The modern implementation of machine learning architectures faces significant challenges due to frequent data transfer between memory and processing units. In-memory computing, primarily through memristor-based analog computing, offers a promising solution to overcome this von Neumann bottleneck. In this technology, data processing and storage are located inside the memory. Here, we introduce a novel approach that utilizes floating-gate Y-Flash memristive devices manufactured with a standard 180 nm CMOS process. These devices offer attractive features, including analog tunability and moderate device-to-device variation; such characteristics are essential for reliable decision-making in ML applications. This paper uses a new machine learning algorithm, the Tsetlin Machine (TM), for in-memory processing architecture. The TM's learning element, Automaton, is mapped into a single Y-Flash cell, where the Automaton's range is transferred into the Y-Flash's conductance scope. Through comprehensive simulations, the proposed hardware implementation of the learning automata, particularly for Tsetlin machines, has demonstrated enhanced scalability and on-edge learning capabilities. △ Less

Submitted 18 August, 2024; originally announced August 2024.

arXiv:2403.10538 [pdf, other]

MATADOR: Automated System-on-Chip Tsetlin Machine Design Generation for Edge Applications

Authors: Tousif Rahman, Gang Mao, Sidharth Maheshwari, Rishad Shafik, Alex Yakovlev

Abstract: System-on-Chip Field-Programmable Gate Arrays (SoC-FPGAs) offer significant throughput gains for machine learning (ML) edge inference applications via the design of co-processor accelerator systems. However, the design effort for training and translating ML models into SoC-FPGA solutions can be substantial and requires specialist knowledge aware trade-offs between model performance, power consumpt… ▽ More System-on-Chip Field-Programmable Gate Arrays (SoC-FPGAs) offer significant throughput gains for machine learning (ML) edge inference applications via the design of co-processor accelerator systems. However, the design effort for training and translating ML models into SoC-FPGA solutions can be substantial and requires specialist knowledge aware trade-offs between model performance, power consumption, latency and resource utilization. Contrary to other ML algorithms, Tsetlin Machine (TM) performs classification by forming logic proposition between boolean actions from the Tsetlin Automata (the learning elements) and boolean input features. A trained TM model, usually, exhibits high sparsity and considerable overlapping of these logic propositions both within and among the classes. The model, thus, can be translated to RTL-level design using a miniscule number of AND and NOT gates. This paper presents MATADOR, an automated boolean-to-silicon tool with GUI interface capable of implementing optimized accelerator design of the TM model onto SoC-FPGA for inference at the edge. It offers automation of the full development pipeline: model training, system level design generation, design verification and deployment. It makes use of the logic sharing that ensues from propositional overlap and creates a compact design by effectively utilizing the TM model's sparsity. MATADOR accelerator designs are shown to be up to 13.4x faster, up to 7x more resource frugal and up to 2x more power efficient when compared to the state-of-the-art Quantized and Binary Deep Neural Network implementations. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.03109 [pdf, other]

Computing with Clocks

Authors: Jonathan Edwards, Alex Yakovlev, Simon O'Keefe

Abstract: Clocks are a central part of many computing paradigms, and are mainly used to synchronise the delicate operation of switching, necessary to drive modern computational processes. Unfortunately, this synchronisation process is reaching a natural ``apocalypse''. No longer can clock scaling be used as a blunt tool to accelerate computation, we are up against the natural limits of switching and synchro… ▽ More Clocks are a central part of many computing paradigms, and are mainly used to synchronise the delicate operation of switching, necessary to drive modern computational processes. Unfortunately, this synchronisation process is reaching a natural ``apocalypse''. No longer can clock scaling be used as a blunt tool to accelerate computation, we are up against the natural limits of switching and synchronisation across large processors. Therefore, we need to rethink how time is utilised in computation, using it more naturally in the role of representing data. This can be achieved by using a time interval delineated by discrete start and end events, and by re-casting computational operations into the time domain. With this, computer systems can be developed that are naturally scaleable in time and space, and can use ambient time references built to the best effort of the available technology. Our ambition is to better manage the energy/computation time trade-off, and to explicitly embed the resolution of the data in the time domain. We aim to recast calculations into the ``for free'' format that time offers, and in addition, perform these calculations at the highest clock or oscillator resolution possible. △ Less

Submitted 5 February, 2024; originally announced February 2024.

MSC Class: 68-01 ACM Class: C.1

arXiv:2310.11481 [pdf, other]

Contracting Tsetlin Machine with Absorbing Automata

Authors: Bimal Bhattarai, Ole-Christoffer Granmo, Lei Jiao, Per-Arne Andersen, Svein Anders Tunheim, Rishad Shafik, Alex Yakovlev

Abstract: In this paper, we introduce a sparse Tsetlin Machine (TM) with absorbing Tsetlin Automata (TA) states. In brief, the TA of each clause literal has both an absorbing Exclude- and an absorbing Include state, making the learning scheme absorbing instead of ergodic. When a TA reaches an absorbing state, it will never leave that state again. If the absorbing state is an Exclude state, both the automato… ▽ More In this paper, we introduce a sparse Tsetlin Machine (TM) with absorbing Tsetlin Automata (TA) states. In brief, the TA of each clause literal has both an absorbing Exclude- and an absorbing Include state, making the learning scheme absorbing instead of ergodic. When a TA reaches an absorbing state, it will never leave that state again. If the absorbing state is an Exclude state, both the automaton and the literal can be removed from further consideration. The literal will as a result never participates in that clause. If the absorbing state is an Include state, on the other hand, the literal is stored as a permanent part of the clause while the TA is discarded. A novel sparse data structure supports these updates by means of three action lists: Absorbed Include, Include, and Exclude. By updating these lists, the TM gets smaller and smaller as the literals and their TA withdraw. In this manner, the computation accelerates during learning, leading to faster learning and less energy consumption. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted to ISTM2023. 7 pages, 8 figures

arXiv:2306.01027 [pdf, other]

An FPGA Architecture for Online Learning using the Tsetlin Machine

Authors: Samuel Prescott, Adrian Wheeldon, Rishad Shafik, Tousif Rahman, Alex Yakovlev, Ole-Christoffer Granmo

Abstract: There is a need for machine learning models to evolve in unsupervised circumstances. New classifications may be introduced, unexpected faults may occur, or the initial dataset may be small compared to the data-points presented to the system during normal operation. Implementing such a system using neural networks involves significant mathematical complexity, which is a major issue in power-critica… ▽ More There is a need for machine learning models to evolve in unsupervised circumstances. New classifications may be introduced, unexpected faults may occur, or the initial dataset may be small compared to the data-points presented to the system during normal operation. Implementing such a system using neural networks involves significant mathematical complexity, which is a major issue in power-critical edge applications. This paper proposes a novel field-programmable gate-array infrastructure for online learning, implementing a low-complexity machine learning algorithm called the Tsetlin Machine. This infrastructure features a custom-designed architecture for run-time learning management, providing on-chip offline and online learning. Using this architecture, training can be carried out on-demand on the \ac{FPGA} with pre-classified data before inference takes place. Additionally, our architecture provisions online learning, where training can be interleaved with inference during operation. Tsetlin Machine (TM) training naturally descends to an optimum, with training also linked to a threshold hyper-parameter which is used to reduce the probability of issuing feedback as the TM becomes trained further. The proposed architecture is modular, allowing the data input source to be easily changed, whilst inbuilt cross-validation infrastructure allows for reliable and representative results during system testing. We present use cases for online learning using the proposed infrastructure and demonstrate the energy/performance/accuracy trade-offs. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.12914 [pdf, other]

IMBUE: In-Memory Boolean-to-CUrrent Inference ArchitecturE for Tsetlin Machines

Authors: Omar Ghazal, Simranjeet Singh, Tousif Rahman, Shengqi Yu, Yujin Zheng, Domenico Balsamo, Sachin Patkar, Farhad Merchant, Fei Xia, Alex Yakovlev, Rishad Shafik

Abstract: In-memory computing for Machine Learning (ML) applications remedies the von Neumann bottlenecks by organizing computation to exploit parallelism and locality. Non-volatile memory devices such as Resistive RAM (ReRAM) offer integrated switching and storage capabilities showing promising performance for ML applications. However, ReRAM devices have design challenges, such as non-linear digital-analog… ▽ More In-memory computing for Machine Learning (ML) applications remedies the von Neumann bottlenecks by organizing computation to exploit parallelism and locality. Non-volatile memory devices such as Resistive RAM (ReRAM) offer integrated switching and storage capabilities showing promising performance for ML applications. However, ReRAM devices have design challenges, such as non-linear digital-analog conversion and circuit overheads. This paper proposes an In-Memory Boolean-to-Current Inference Architecture (IMBUE) that uses ReRAM-transistor cells to eliminate the need for such conversions. IMBUE processes Boolean feature inputs expressed as digital voltages and generates parallel current paths based on resistive memory states. The proportional column current is then translated back to the Boolean domain for further digital processing. The IMBUE architecture is inspired by the Tsetlin Machine (TM), an emerging ML algorithm based on intrinsically Boolean logic. The IMBUE architecture demonstrates significant performance improvements over binarized convolutional neural networks and digital TM in-memory implementations, achieving up to a 12.99x and 5.28x increase, respectively. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: Accepted at ACM/IEEE International Symposium on Low Power Electronics and Design 2023 (ISLPED 2023)

arXiv:2305.11928 [pdf, other]

Energy-frugal and Interpretable AI Hardware Design using Learning Automata

Authors: Rishad Shafik, Tousif Rahman, Adrian Wheeldon, Ole-Christoffer Granmo, Alex Yakovlev

Abstract: Energy efficiency is a crucial requirement for enabling powerful artificial intelligence applications at the microedge. Hardware acceleration with frugal architectural allocation is an effective method for reducing energy. Many emerging applications also require the systems design to incorporate interpretable decision models to establish responsibility and transparency. The design needs to provisi… ▽ More Energy efficiency is a crucial requirement for enabling powerful artificial intelligence applications at the microedge. Hardware acceleration with frugal architectural allocation is an effective method for reducing energy. Many emerging applications also require the systems design to incorporate interpretable decision models to establish responsibility and transparency. The design needs to provision for additional resources to provide reachable states in real-world data scenarios, defining conflicting design tradeoffs between energy efficiency. is challenging. Recently a new machine learning algorithm, called the Tsetlin machine, has been proposed. The algorithm is fundamentally based on the principles of finite-state automata and benefits from natural logic underpinning rather than arithmetic. In this paper, we investigate methods of energy-frugal artificial intelligence hardware design by suitably tuning the hyperparameters, while maintaining high learning efficacy. To demonstrate interpretability, we use reachability and game-theoretic analysis in two simulation environments: a SystemC model to study the bounded state transitions in the presence of hardware faults and Nash equilibrium between states to analyze the learning convergence. Our analyses provides the first insights into conflicting design tradeoffs involved in energy-efficient and interpretable decision models for this new artificial intelligence hardware architecture. We show that frugal resource allocation coupled with systematic prodigality between randomized reinforcements can provide decisive energy reduction while also achieving robust and interpretable learning. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2304.13552 [pdf, other]

Finite State Automata Design using 1T1R ReRAM Crossbar

Authors: Simranjeet Singh, Omar Ghazal, Chandan Kumar Jha, Vikas Rana, Rolf Drechsler, Rishad Shafik, Alex Yakovlev, Sachin Patkar, Farhad Merchant

Abstract: Data movement costs constitute a significant bottleneck in modern machine learning (ML) systems. When combined with the computational complexity of algorithms, such as neural networks, designing hardware accelerators with low energy footprint remains challenging. Finite state automata (FSA) constitute a type of computation model used as a low-complexity learning unit in ML systems. The implementat… ▽ More Data movement costs constitute a significant bottleneck in modern machine learning (ML) systems. When combined with the computational complexity of algorithms, such as neural networks, designing hardware accelerators with low energy footprint remains challenging. Finite state automata (FSA) constitute a type of computation model used as a low-complexity learning unit in ML systems. The implementation of FSA consists of a number of memory states. However, FSA can be in one of the states at a given time. It switches to another state based on the present state and input to the FSA. Due to its natural synergy with memory, it is a promising candidate for in-memory computing for reduced data movement costs. This work focuses on a novel FSA implementation using resistive RAM (ReRAM) for state storage in series with a CMOS transistor for biasing controls. We propose using multi-level ReRAM technology capable of transitioning between states depending on bias pulse amplitude and duration. We use an asynchronous control circuit for writing each ReRAM-transistor cell for the on-demand switching of the FSA. We investigate the impact of the device-to-device and cycle-to-cycle variations on the cell and show that FSA transitions can be seamlessly achieved without degradation of performance. Through extensive experimental evaluation, we demonstrate the implementation of FSA on 1T1R ReRAM crossbar. △ Less

Submitted 30 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Accepted by 21st IEEE Interregional NEWCAS Conference 2023 (NEWCAS 2023)

arXiv:2301.10005 [pdf, other]

An Event-Driven Approach To Genotype Imputation On A Custom RISC-V FPGA Cluster

Authors: Jordan Morris, Ashur Rafiev, Graeme Bragg, Mark Vousden, David Thomas, Alex Yakovlev, Andrew Brown

Abstract: This paper proposes an event-driven solution to genotype imputation, a technique used to statistically infer missing genetic markers in DNA. The work implements the widely accepted Li and Stephens model, primary contributor to the computational complexity of modern x86 solutions, in an attempt to determine whether further investigation of the application is warranted in the event-driven domain. Th… ▽ More This paper proposes an event-driven solution to genotype imputation, a technique used to statistically infer missing genetic markers in DNA. The work implements the widely accepted Li and Stephens model, primary contributor to the computational complexity of modern x86 solutions, in an attempt to determine whether further investigation of the application is warranted in the event-driven domain. The model is implemented using graph-based Hidden Markov Modeling and executed as a customized forward/backward dynamic programming algorithm. The solution uses an event-driven paradigm to map the algorithm to thousands of concurrent cores, where events are small messages that carry both control and data within the algorithm. The design of a single processing element is discussed. This is then extended across multiple FPGAs and executed on a custom RISC-V NoC FPGA cluster called POETS. Results demonstrate how the algorithm scales over increasing hardware resources and a 48 FPGA run demonstrates a 270X reduction in wall-clock processing time when compared to a single-threaded x86 solution. Optimisation of the algorithm via linear interpolation is then introduced and tested, with results demonstrating a wall-clock reduction time of approx. 5 orders of magnitude when compared to a similarly optimised x86 solution. △ Less

Submitted 22 January, 2023; originally announced January 2023.

arXiv:2201.01846 [pdf, other]

Modelling Hospital Strategies in City-Scale Ambulance Dispatching

Authors: Xinyu Fu, Valeria Krzhizhanovskaya, Alexey Yakovlev, Sergey Kovalchuk

Abstract: The optimisation in the ambulance dispatching process is significant for patients who need early treatments. However, the problem of dynamic ambulance redeployment for destination hospital selection has rarely been investigated. The paper proposes an approach to model and simulate the ambulance dispatching process in multi-agents healthcare environments of large cities. The proposed approach is ba… ▽ More The optimisation in the ambulance dispatching process is significant for patients who need early treatments. However, the problem of dynamic ambulance redeployment for destination hospital selection has rarely been investigated. The paper proposes an approach to model and simulate the ambulance dispatching process in multi-agents healthcare environments of large cities. The proposed approach is based on using the coupled game-theoretic (GT) approach to identify hospital strategies (considering hospitals as players within a non-cooperative game) and performing discrete-event simulation (DES) of patient delivery and provision of healthcare services to evaluate ambulance dispatching (selection of target hospital). Assuming the collective nature of decisions on patient delivery, the approach assesses the influence of the diverse behaviours of hospitals on system performance with possible further optimisation of this performance. The approach is studied through a series of cases starting with a simplified 1D model and proceeding with a coupled 2D model and real-world application. The study considers the problem of dispatching ambulances to patients with the ACS directed to the PCI in the target hospital. A real-world case study of data from Saint Petersburg (Russia) is analysed showing the better conformity of the global characteristics (mortality rate) of the healthcare system with the proposed approach being applied to discovering the agents' diverse behaviour. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:2109.00846 [pdf, other]

Self-timed Reinforcement Learning using Tsetlin Machine

Authors: Adrian Wheeldon, Alex Yakovlev, Rishad Shafik

Abstract: We present a hardware design for the learning datapath of the Tsetlin machine algorithm, along with a latency analysis of the inference datapath. In order to generate a low energy hardware which is suitable for pervasive artificial intelligence applications, we use a mixture of asynchronous design techniques - including Petri nets, signal transition graphs, dual-rail and bundled-data. The work bui… ▽ More We present a hardware design for the learning datapath of the Tsetlin machine algorithm, along with a latency analysis of the inference datapath. In order to generate a low energy hardware which is suitable for pervasive artificial intelligence applications, we use a mixture of asynchronous design techniques - including Petri nets, signal transition graphs, dual-rail and bundled-data. The work builds on previous design of the inference hardware, and includes an in-depth breakdown of the automaton feedback, probability generation and Tsetlin automata. Results illustrate the advantages of asynchronous design in applications such as personalized healthcare and battery-powered internet of things devices, where energy is limited and latency is an important figure of merit. Challenges of static timing analysis in asynchronous circuits are also addressed. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2102.01348 [pdf]

QoS-Aware Power Minimization of Distributed Many-Core Servers using Transfer Q-Learning

Authors: Dainius Jenkus, Fei Xia, Rishad Shafik, Alex Yakovlev

Abstract: Web servers scaled across distributed systems necessitate complex runtime controls for providing quality of service (QoS) guarantees as well as minimizing the energy costs under dynamic workloads. This paper presents a QoS-aware runtime controller using horizontal scaling (node allocation) and vertical scaling (resource allocation within nodes) methods synergistically to provide adaptation to work… ▽ More Web servers scaled across distributed systems necessitate complex runtime controls for providing quality of service (QoS) guarantees as well as minimizing the energy costs under dynamic workloads. This paper presents a QoS-aware runtime controller using horizontal scaling (node allocation) and vertical scaling (resource allocation within nodes) methods synergistically to provide adaptation to workloads while minimizing the power consumption under QoS constraint (i.e., response time). A horizontal scaling determines the number of active nodes based on workload demands and the required QoS according to a set of rules. Then, it is coupled with vertical scaling using transfer Q-learning, which further tunes power/performance based on workload profile using dynamic voltage/frequency scaling (DVFS). It transfers Q-values within minimally explored states reducing exploration requirements. In addition, the approach exploits a scalable architecture of the many-core server allowing to reuse available knowledge from fully or partially explored nodes. When combined, these methods allow to reduce the exploration time and QoS violations when compared to model-free Q-learning. The technique balances design-time and runtime costs to maximize the portability and operational optimality demonstrated through persistent power reductions with minimal QoS violations under different workload scenarios on heterogeneous multi-processing nodes of a server cluster. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: Presented at DATE Friday Workshop on System-level Design Methods for Deep Learning on Heterogeneous Architectures (SLOHA 2021) (arXiv:2102.00818)

Report number: SLOHA/2021/07

arXiv:2101.11336 [pdf, other]

Low-Power Audio Keyword Spotting using Tsetlin Machines

Authors: Jie Lei, Tousif Rahman, Rishad Shafik, Adrian Wheeldon, Alex Yakovlev, Ole-Christoffer Granmo, Fahim Kawsar, Akhil Mathur

Abstract: The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current Neural Network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorith… ▽ More The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current Neural Network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant reduction in parameter requirements and choosing logic over arithmetic based processing, the TM offers new opportunities for low-power KWS while maintaining high learning efficacy. In this paper we explore a TM based keyword spotting (KWS) pipeline to demonstrate low complexity with faster rate of convergence compared to NNs. Further, we investigate the scalability with increasing keywords and explore the potential for enabling low-power on-chip KWS. △ Less

Submitted 27 January, 2021; originally announced January 2021.

Comments: 20 pp

Journal ref: Pre-print of original submission to Journal of Low Power Electronics and Applications, 2021

arXiv:2012.03402 [pdf, other]

Low-Latency Asynchronous Logic Design for Inference at the Edge

Authors: Adrian Wheeldon, Alex Yakovlev, Rishad Shafik, Jordan Morris

Abstract: Modern internet of things (IoT) devices leverage machine learning inference using sensed data on-device rather than offloading them to the cloud. Commonly known as inference at-the-edge, this gives many benefits to the users, including personalization and security. However, such applications demand high energy efficiency and robustness. In this paper we propose a method for reduced area and power… ▽ More Modern internet of things (IoT) devices leverage machine learning inference using sensed data on-device rather than offloading them to the cloud. Commonly known as inference at-the-edge, this gives many benefits to the users, including personalization and security. However, such applications demand high energy efficiency and robustness. In this paper we propose a method for reduced area and power overhead of self-timed early-propagative asynchronous inference circuits, designed using the principles of learning automata. Due to natural resilience to timing as well as logic underpinning, the circuits are tolerant to variations in environment and supply voltage whilst enabling the lowest possible latency. Our method is exemplified through an inference datapath for a low power machine learning application. The circuit builds on the Tsetlin machine algorithm further enhancing its energy efficiency. Average latency of the proposed circuit is reduced by 10x compared with the synchronous implementation whilst maintaining similar area. Robustness of the proposed circuit is proven through post-synthesis simulation with 0.25 V to 1.2 V supply. Functional correctness is maintained and latency scales with gate delay as voltage is decreased. △ Less

Submitted 6 December, 2020; originally announced December 2020.

Journal ref: Proc. Conf. Des. Autom. Test Eur., Feb. 2021, pp. 370-373

arXiv:2007.02114 [pdf, other]

A Novel Multi-Step Finite-State Automaton for Arbitrarily Deterministic Tsetlin Machine Learning

Authors: K. Darshana Abeyrathna, Ole-Christoffer Granmo, Rishad Shafik, Alex Yakovlev, Adrian Wheeldon, Jie Lei, Morten Goodwin

Abstract: Due to the high energy consumption and scalability challenges of deep learning, there is a critical need to shift research focus towards dealing with energy consumption constraints. Tsetlin Machines (TMs) are a recent approach to machine learning that has demonstrated significantly reduced energy usage compared to neural networks alike, while performing competitively accuracy-wise on several bench… ▽ More Due to the high energy consumption and scalability challenges of deep learning, there is a critical need to shift research focus towards dealing with energy consumption constraints. Tsetlin Machines (TMs) are a recent approach to machine learning that has demonstrated significantly reduced energy usage compared to neural networks alike, while performing competitively accuracy-wise on several benchmarks. However, TMs rely heavily on energy-costly random number generation to stochastically guide a team of Tsetlin Automata to a Nash Equilibrium of the TM game. In this paper, we propose a novel finite-state learning automaton that can replace the Tsetlin Automata in TM learning, for increased determinism. The new automaton uses multi-step deterministic state jumps to reinforce sub-patterns. Simultaneously, flipping a coin to skip every $d$'th state update ensures diversification by randomization. The $d$-parameter thus allows the degree of randomization to be finely controlled. E.g., $d=1$ makes every update random and $d=\infty$ makes the automaton completely deterministic. Our empirical results show that, overall, only substantial degrees of determinism reduces accuracy. Energy-wise, random number generation constitutes switching energy consumption of the TM, saving up to 11 mW power for larger datasets with high $d$ values. We can thus use the new $d$-parameter to trade off accuracy against energy consumption, to facilitate low-energy machine learning. △ Less

Submitted 4 July, 2020; originally announced July 2020.

Comments: 10 pages, 8 figures, 7 tables

arXiv:2004.09290 [pdf]

Investigating Coordination of Hospital Departments in Delivering Healthcare for Acute Coronary Syndrome Patients using Data-Driven Network Analysis

Authors: Tesfamariam M Abuhay, Yemisrach G Getinet, Oleg G Metsker, Alexey N Yakovlev, Sergey V Kovalchuk

Abstract: Healthcare systems are challenged to deliver high-quality and efficient care. Studying patient flow in a hospital is particularly fundamental as it demonstrates effectiveness and efficiency of a hospital. Since hospital is a collection of physically nearby services under one administration, its performance and outcome are shaped by the interaction of its discrete components. Coordination of proces… ▽ More Healthcare systems are challenged to deliver high-quality and efficient care. Studying patient flow in a hospital is particularly fundamental as it demonstrates effectiveness and efficiency of a hospital. Since hospital is a collection of physically nearby services under one administration, its performance and outcome are shaped by the interaction of its discrete components. Coordination of processes at different levels of organizational structure of a hospital can be studied using network analysis. Hence, this article presents a data-driven static and temporal network of departments. Both networks are directed and weighted and constructed using seven years' (2010-2016) empirical data of 24902 Acute Coronary Syndrome (ACS) patients. The ties reflect an episode-based transfer of ACS patients from department to department in a hospital. The weight represents the number of patients transferred among departments. As a result, the underlying structure of a network of departments that deliver healthcare for ACS patients is described, the main departments and their role in the diagnosis and treatment process of ACS patients are identified, the role of departments over seven years is analyzed and communities of departments are discovered. The results of this study may help hospital administration to effectively organize and manage the coordination of departments based on their significance, strategic positioning and role in the diagnosis and treatment process which, in turn, nurtures value-based and precision healthcare. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: Preprint. This paper is accepted by the International Conference on Computational Science (ICCS2020) and will be published as a full paper in Springer Lecture Notes in Computer Science (LNCS)

arXiv:2004.01123 [pdf]

doi 10.1016/j.jocs.2022.101562

Surrogate-assisted performance prediction for data-driven knowledge discovery algorithms: application to evolutionary modeling of clinical pathways

Authors: Anastasia A. Funkner, Aleksey N. Yakovlev, Sergey V. Kovalchuk

Abstract: The paper proposes and investigates an approach for surrogate-assisted performance prediction of data-driven knowledge discovery algorithms. The approach is based on the identification of surrogate models for prediction of the target algorithm's quality and performance. The proposed approach was implemented and investigated as applied to an evolutionary algorithm for discovering clusters of interp… ▽ More The paper proposes and investigates an approach for surrogate-assisted performance prediction of data-driven knowledge discovery algorithms. The approach is based on the identification of surrogate models for prediction of the target algorithm's quality and performance. The proposed approach was implemented and investigated as applied to an evolutionary algorithm for discovering clusters of interpretable clinical pathways in electronic health records of patients with acute coronary syndrome. Several clustering metrics and execution time were used as the target quality and performance metrics respectively. An analytical software prototype based on the proposed approach for the prediction of algorithm characteristics and feature analysis was developed to provide a more interpretable prediction of the target algorithm's performance and quality that can be further used for parameter tuning. △ Less

Submitted 7 January, 2022; v1 submitted 2 April, 2020; originally announced April 2020.

arXiv:1910.08426 [pdf, other]

A Pulse Width Modulation based Power-elastic and Robust Mixed-signal Perceptron Design

Authors: Sergey Mileiko, Rishad Shafik, Alex Yakovlev, Jonathan Edwards

Abstract: Neural networks are exerting burgeoning influence in emerging artificial intelligence applications at the micro-edge, such as sensing systems and image processing. As many of these systems are typically self-powered, their circuits are expected to be resilient and efficient in the presence of continuous power variations caused by the harvesters. In this paper, we propose a novel mixed-signal (i.… ▽ More Neural networks are exerting burgeoning influence in emerging artificial intelligence applications at the micro-edge, such as sensing systems and image processing. As many of these systems are typically self-powered, their circuits are expected to be resilient and efficient in the presence of continuous power variations caused by the harvesters. In this paper, we propose a novel mixed-signal (i.e. analogue/digital) approach of designing a power-elastic perceptron using the principle of pulse width modulation (PWM). Fundamental to the design are a number of parallel inverters that transcode the input-weight pairs based on the principle of PWM duty cycle. Since PWM-based inverters are typically agnostic to amplitude and frequency variations, the perceptron shows a high degree of power elasticity and robustness under these variations. We show extensive design analysis in Cadence Analog Design Environment tool using a 3x3 perceptron circuit as a case study to demonstrate the resilience in the presence of parameric variations. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: arXiv admin note: text overlap with arXiv:1910.07492

Journal ref: Design, Automation and Test in Europe (DATE), Florence, 2019

arXiv:1910.07492 [pdf, other]

doi 10.1098/rsta.2019.0166

Neural Network Design for Energy-Autonomous AI Applications using Temporal Encoding

Authors: Sergey Mileiko, Thanasin Bunnam, Fei Xia, Rishad Shafik, Alex Yakovlev, Shidhartha Das

Abstract: Neural Networks (NNs) are steering a new generation of artificial intelligence (AI) applications at the micro-edge. Examples include wireless sensors, wearables and cybernetic systems that collect data and process them to support real-world decisions and controls. For energy autonomy, these applications are typically powered by energy harvesters. As harvesters and other power sources which provide… ▽ More Neural Networks (NNs) are steering a new generation of artificial intelligence (AI) applications at the micro-edge. Examples include wireless sensors, wearables and cybernetic systems that collect data and process them to support real-world decisions and controls. For energy autonomy, these applications are typically powered by energy harvesters. As harvesters and other power sources which provide energy autonomy inevitably have power variations, the circuits need to robustly operate over a dynamic power envelope. In other words, the NN hardware needs to be able to function correctly under unpredictable and variable supply voltages. In this paper, we propose a novel NN design approach using the principle of pulse width modulation (PWM). PWM signals represent information with their duty cycle values which may be made independent of the voltages and frequencies of the carrier signals. We design a PWM-based perceptron which can serve as the fundamental building block for NNs, by using an entirely new method of realising arithmetic in the PWM domain. We analyse the proposed approach building from a 3x3 perceptron circuit to a complex multi-layer NN. Using handwritten character recognition as an exemplar of AI applications, we demonstrate the power elasticity, resilience and efficiency of the proposed NN design in the presence of functional and parametric variations including large voltage variations in the power supply. △ Less

Submitted 15 October, 2019; originally announced October 2019.

arXiv:1710.09721 [pdf, other]

doi 10.1007/978-3-319-69775-8_11

Topological characteristics of oil and gas reservoirs and their applications

Authors: V. A. Baikov, R. R. Gilmanov, I. A. Taimanov, A. A. Yakovlev

Abstract: We demonstrate applications of topological characteristics of oil and gas reservoirs considered as three-dimensional bodies to geological modeling. We demonstrate applications of topological characteristics of oil and gas reservoirs considered as three-dimensional bodies to geological modeling. △ Less

Submitted 26 October, 2017; originally announced October 2017.

Comments: 12 pages

Journal ref: Towards Integrative Machine Learning and Knowledge Extraction, 182-193, Lecture Notes in Comput. Sci., 10344, Lecture Notes in Artificial Intelligence, Springer, Cham, 2017

arXiv:1702.07733 [pdf]

doi 10.1016/j.jbi.2018.05.004

Simulation of Patient Flow in Multiple Healthcare Units using Process and Data Mining Techniques for Model Identification

Authors: Sergey V. Kovalchuk, Anastasia A. Funkner, Oleg G. Metsker, Aleksey N. Yakovlev

Abstract: Introduction: An approach to building a hybrid simulation of patient flow is introduced with a combination of data-driven methods for automation of model identification. The approach is described with a conceptual framework and basic methods for combination of different techniques. The implementation of the proposed approach for simulation of acute coronary syndrome (ACS) was developed and used wi… ▽ More Introduction: An approach to building a hybrid simulation of patient flow is introduced with a combination of data-driven methods for automation of model identification. The approach is described with a conceptual framework and basic methods for combination of different techniques. The implementation of the proposed approach for simulation of acute coronary syndrome (ACS) was developed and used within an experimental study. Methods: Combination of data, text, and process mining techniques and machine learning approaches for analysis of electronic health records (EHRs) with discrete-event simulation (DES) and queueing theory for simulation of patient flow was proposed. The performed analysis of EHRs for ACS patients enable identification of several classes of clinical pathways (CPs) which were used to implement a more realistic simulation of the patient flow. The developed solution was implemented using Python libraries (SimPy, SciPy, and others). Results: The proposed approach enables more realistic and detailed simulation of the patient flow within a group of related departments. Experimental study shows that the improved simulation of patient length of stay for ACS patient flow obtained from EHRs in Federal Almazov North-west Medical Research Centre in Saint Petersburg, Russia. Conclusion: The proposed approach, methods, and solutions provide a conceptual, methodological, and programming framework for implementation of simulation of complex and diverse scenarios within a flow of patients for different purposes: decision making, training, management optimization, and others. △ Less

Submitted 22 January, 2018; v1 submitted 22 February, 2017; originally announced February 2017.

arXiv:1302.6885 [pdf, other]

Numerical analysis of topological characteristics of three-dimensional geological models of oil and gas fields

Authors: Ya. V. Bazaikin, V. A. Baikov, I. A. Taimanov, A. A. Yakovlev

Abstract: We discuss the study of topological characteristics of random fields that are used for numerical simulation of oil and gas reservoirs and numerical algorithms (see arXiv:1302.3669), for computing such characteristics, for which we demonstrate results of their applications. We discuss the study of topological characteristics of random fields that are used for numerical simulation of oil and gas reservoirs and numerical algorithms (see arXiv:1302.3669), for computing such characteristics, for which we demonstrate results of their applications. △ Less

Submitted 27 February, 2013; originally announced February 2013.

Comments: 19 pages, 11 figures, 1 table

Journal ref: Mathematical Modeling 25:10 (2013), 19-31

Showing 1–28 of 28 results for author: Yakovlev, A