-
Spin Wave Threshold Gate
Authors:
Arne Van Zegbroeck,
Pantazis Anagnostou,
Said Hamdioui,
Christop Adelmann,
Florin Ciubotaru,
Sorin Cotofana
Abstract:
While Spin Waves (SW) interaction provides natural support for low power Majority (MAJ) gate implementations many hurdles still exists on the road towards the realization of practically relevant SW circuits. In this paper we leave the SW interaction avenue and propose Threshold Logic (TL) inspired SW computing, which relies on successive phase rotations applied to one single SW instead of on the i…
▽ More
While Spin Waves (SW) interaction provides natural support for low power Majority (MAJ) gate implementations many hurdles still exists on the road towards the realization of practically relevant SW circuits. In this paper we leave the SW interaction avenue and propose Threshold Logic (TL) inspired SW computing, which relies on successive phase rotations applied to one single SW instead of on the interference of an odd number of SWs. After providing a short TL inside we introduce the SW TL gate concept and discuss the way to mirror TL gate weight and threshold values into physical phase-shifter parameters. Subsequently, we design and demonstrate proper operation of a SW TL based Full Adder (FA) by means of micro-magnetic simulations. We conclude the paper by providing inside on the potential advantages of our proposal by means of a conceptual comparison of MAJ and TL based FA implementations.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Spintronic logic: from transducers to logic gates and circuits
Authors:
Christoph Adelmann,
Florin Ciubotaru,
Fanfan Meng,
Sorin Cotofana,
Sebastien Couet
Abstract:
While magnetic solid-state memory has found commercial applications to date, magnetic logic has rather remained on a conceptual level so far. Here, we discuss open challenges of different spintronic logic approaches, which use magnetic excitations for computation. While different logic gate designs have been proposed and proof of concept experiments have been reported, no nontrivial operational sp…
▽ More
While magnetic solid-state memory has found commercial applications to date, magnetic logic has rather remained on a conceptual level so far. Here, we discuss open challenges of different spintronic logic approaches, which use magnetic excitations for computation. While different logic gate designs have been proposed and proof of concept experiments have been reported, no nontrivial operational spintronic circuit has been demonstrated due to many open challenges in spintronic circuit and system design. Furthermore, the integration of spintronic circuits in CMOS systems will require the usage of transducers between the electric (CMOS) and magnetic domains. We show that these transducers can limit the performance as well as the energy consumption of hybrid CMOS-spintronic systems. Hence, the optimization of transducer efficiency will be a major step towards competitive spintronic logic system.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Roadmap for Unconventional Computing with Nanotechnology
Authors:
Giovanni Finocchio,
Jean Anne C. Incorvia,
Joseph S. Friedman,
Qu Yang,
Anna Giordano,
Julie Grollier,
Hyunsoo Yang,
Florin Ciubotaru,
Andrii Chumak,
Azad J. Naeemi,
Sorin D. Cotofana,
Riccardo Tomasello,
Christos Panagopoulos,
Mario Carpentieri,
Peng Lin,
Gang Pan,
J. Joshua Yang,
Aida Todri-Sanial,
Gabriele Boschetto,
Kremena Makasheva,
Vinod K. Sangwan,
Amit Ranjan Trivedi,
Mark C. Hersam,
Kerem Y. Camsari,
Peter L. McMahon
, et al. (26 additional authors not shown)
Abstract:
In the "Beyond Moore's Law" era, with increasing edge intelligence, domain-specific computing embracing unconventional approaches will become increasingly prevalent. At the same time, adopting a variety of nanotechnologies will offer benefits in energy cost, computational speed, reduced footprint, cyber resilience, and processing power. The time is ripe for a roadmap for unconventional computing w…
▽ More
In the "Beyond Moore's Law" era, with increasing edge intelligence, domain-specific computing embracing unconventional approaches will become increasingly prevalent. At the same time, adopting a variety of nanotechnologies will offer benefits in energy cost, computational speed, reduced footprint, cyber resilience, and processing power. The time is ripe for a roadmap for unconventional computing with nanotechnologies to guide future research, and this collection aims to fill that need. The authors provide a comprehensive roadmap for neuromorphic computing using electron spins, memristive devices, two-dimensional nanomaterials, nanomagnets, and various dynamical systems. They also address other paradigms such as Ising machines, Bayesian inference engines, probabilistic computing with p-bits, processing in memory, quantum memories and algorithms, computing with skyrmions and spin waves, and brain-inspired computing for incremental learning and problem-solving in severely resource-constrained environments. These approaches have advantages over traditional Boolean computing based on von Neumann architecture. As the computational requirements for artificial intelligence grow 50 times faster than Moore's Law for electronics, more unconventional approaches to computing and signal processing will appear on the horizon, and this roadmap will help identify future needs and challenges. In a very fertile field, experts in the field aim to present some of the dominant and most promising technologies for unconventional computing that will be around for some time to come. Within a holistic approach, the goal is to provide pathways for solidifying the field and guiding future impactful discoveries.
△ Less
Submitted 27 February, 2024; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Perspectives and Challenges of Scaled Boolean Spintronic Circuits Based on Magnetic Tunnel Junction Transducers
Authors:
F. Meng,
S. -Y. Lee,
O. Zografos,
M. Gupta,
V. D. Nguyen,
G. De Micheli,
S. Cotofana,
I. Asselberghs,
C. Adelmann,
G. Sankar Kar,
S. Couet,
F. Ciubotaru
Abstract:
This paper addresses the question: Can spintronic circuits based on Magnetic Tunnel Junction (MTJ) transducers outperform their state-of-the-art CMOS counterparts? To this end, we use the EPFL combinational benchmark sets, synthesize them in 7 nm CMOS and in MTJ-based spintronic technologies, and compare the two implementation methods in terms of Energy-Delay-Product (EDP). To fully utilize the te…
▽ More
This paper addresses the question: Can spintronic circuits based on Magnetic Tunnel Junction (MTJ) transducers outperform their state-of-the-art CMOS counterparts? To this end, we use the EPFL combinational benchmark sets, synthesize them in 7 nm CMOS and in MTJ-based spintronic technologies, and compare the two implementation methods in terms of Energy-Delay-Product (EDP). To fully utilize the technologies potential, CMOS and spintronic implementations are built upon standard Boolean and Majority Gates, respectively. For the spintronic circuits, we assumed that domain conversion (electric/magnetic to magnetic/electric) is performed by means of MTJs and the computation is accomplished by domain wall based majority gates, and considered two EDP estimation scenarios: (i) Uniform Benchmarking, which ignores the circuit's internal structure and only includes domain transducers power and delay contributions into the calculations, and (ii) Majority-Inverter-Graph Benchmarking, which also embeds the circuit structure, the associated critical path delay and energy consumption by DW propagation. Our results indicate that for the uniform case, the spintronic route is better suited for the implementation of complex circuits with few inputs and outputs. On the other hand, when the circuit structure is also considered via majority and inverter synthesis, our analysis clearly indicates that in order to match and eventually outperform CMOS performance, MTJ efficiency has to be improved by 3-4 orders of magnitude. While it is clear that for the time being the MTJ-based-spintronic way cannot compete with CMOS, further transducer developments may tip the balance, which, when combined with information non-volatility, may make spintronic implementation for certain applications that require a large number of calculations and have a rather limited amount of interaction with the environment.
△ Less
Submitted 29 June, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Memory-Efficient Dataflow Inference for Deep CNNs on FPGA
Authors:
Lucian Petrica,
Tobias Alonso,
Mairin Kroes,
Nicholas Fraser,
Sorin Cotofana,
Michaela Blott
Abstract:
Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However, in these accelerators the shapes of parameter memories are dictated by throughput constraints and do not map well to the underlying OCM, which becomes an implem…
▽ More
Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However, in these accelerators the shapes of parameter memories are dictated by throughput constraints and do not map well to the underlying OCM, which becomes an implementation bottleneck. In this work, we propose an accelerator design methodology - Frequency Compensated Memory Packing (FCMP) - which improves the OCM utilization efficiency of dataflow accelerators with minimal reduction in throughput and no modifications to the physical structure of FPGA OCM. To validate our methodology, we apply it to several realizations of medium-sized CIFAR-10 inference accelerators and demonstrate up to 30% reduction in OCM utilization without loss of inference throughput, allowing us to port the accelerators from Xilinx Zynq 7020 to 7012S, reducing application cost. We also implement a custom dataflow FPGA inference accelerator for a quantized ResNet-50 CNN, utilizing on-chip weights, the largest topology ever implemented with this accelerator architecture. We demonstrate that by applying FCMP to the ResNet accelerator, the OCM bottleneck is alleviated which enables the accelerator to be ported from Alveo U250 to the smaller Alveo U280 board with less throughput loss compared to alternative techniques. By providing a finer-grained trade off between throughput and OCM requirements, FCMP increases the flexibility of custom dataflow CNN inference designs on FPGA.
△ Less
Submitted 14 November, 2020;
originally announced November 2020.
-
Efficient Computation Reduction in Bayesian Neural Networks Through Feature Decomposition and Memorization
Authors:
Xiaotao Jia,
Jianlei Yang,
Runze Liu,
Xueyan Wang,
Sorin Dan Cotofana,
Weisheng Zhao
Abstract:
Bayesian method is capable of capturing real world uncertainties/incompleteness and properly addressing the over-fitting issue faced by deep neural networks. In recent years, Bayesian Neural Networks (BNNs) have drawn tremendous attentions of AI researchers and proved to be successful in many applications. However, the required high computation complexity makes BNNs difficult to be deployed in com…
▽ More
Bayesian method is capable of capturing real world uncertainties/incompleteness and properly addressing the over-fitting issue faced by deep neural networks. In recent years, Bayesian Neural Networks (BNNs) have drawn tremendous attentions of AI researchers and proved to be successful in many applications. However, the required high computation complexity makes BNNs difficult to be deployed in computing systems with limited power budget. In this paper, an efficient BNN inference flow is proposed to reduce the computation cost then is evaluated by means of both software and hardware implementations. A feature decomposition and memorization (\texttt{DM}) strategy is utilized to reform the BNN inference flow in a reduced manner. About half of the computations could be eliminated compared to the traditional approach that has been proved by theoretical analysis and software validations. Subsequently, in order to resolve the hardware resource limitations, a memory-friendly computing framework is further deployed to reduce the memory overhead introduced by \texttt{DM} strategy. Finally, we implement our approach in Verilog and synthesise it with 45 $nm$ FreePDK technology. Hardware simulation results on multi-layer BNNs demonstrate that, when compared with the traditional BNN inference method, it provides an energy consumption reduction of 73\% and a 4$\times$ speedup at the expense of 14\% area overhead.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
Evolutionary Bin Packing for Memory-Efficient Dataflow Inference Acceleration on FPGA
Authors:
Mairin Kroes,
Lucian Petrica,
Sorin Cotofana,
Michaela Blott
Abstract:
Convolutional neural network (CNN) dataflow inference accelerators implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency compared to CNN execution on CPUs or GPUs. However, the complex shapes of CNN parameter memories do not typically map well to FPGA on-chip memories (OCM), which results in poor OCM utilization and ultimately limits…
▽ More
Convolutional neural network (CNN) dataflow inference accelerators implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency compared to CNN execution on CPUs or GPUs. However, the complex shapes of CNN parameter memories do not typically map well to FPGA on-chip memories (OCM), which results in poor OCM utilization and ultimately limits the size and types of CNNs which can be effectively accelerated on FPGAs. In this work, we present a design methodology that improves the mapping efficiency of CNN parameters to FPGA OCM. We frame the mapping as a bin packing problem and determine that traditional bin packing algorithms are not well suited to solve the problem within FPGA- and CNN-specific constraints. We hybridize genetic algorithms and simulated annealing with traditional bin packing heuristics to create flexible mappers capable of grouping parameter memories such that each group optimally fits FPGA on-chip memories. We evaluate these algorithms on a variety of FPGA inference accelerators. Our hybrid mappers converge to optimal solutions in a matter of seconds for all CNN use-cases, achieve an increase of up to 65% in OCM utilization efficiency for deep CNNs, and are up to 200$\times$ faster than current state-of-the-art simulated annealing approaches.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
Memristive oscillatory circuits for resolution of NP-complete logic puzzles: Sudoku case
Authors:
Theodoros Panagiotis Chatzinikolaou,
Iosif-Angelos Fyrigos,
Rafailia-Eleni Karamani,
Vasileios Ntinas,
Giorgos Dimitrakopoulos,
Sorin Cotofana,
Georgios Ch. Sirakoulis
Abstract:
Memristor networks are capable of low-power and massive parallel processing and information storage. Moreover, they have presented the ability to apply for a vast number of intelligent data analysis applications targeting mobile edge devices and low power computing. Beyond the memory and conventional computing architectures, memristors are widely studied in circuits aiming for increased intelligen…
▽ More
Memristor networks are capable of low-power and massive parallel processing and information storage. Moreover, they have presented the ability to apply for a vast number of intelligent data analysis applications targeting mobile edge devices and low power computing. Beyond the memory and conventional computing architectures, memristors are widely studied in circuits aiming for increased intelligence that are suitable to tackle complex problems in a power and area efficient manner, offering viable solutions oftenly arriving also from the biological principles of living organisms. In this paper, a memristive circuit exploiting the dynamics of oscillating networks is utilized for the resolution of very popular and NP-complete logic puzzles, like the well-known "Sudoku". More specifically, the proposed circuit design methodology allows for appropriate usage of interconnections' advantages in a oscillation network and of memristor's switching dynamics resulting to logic-solvable puzzle-instances. The reduced complexity of the proposed circuit and its increased scalability constitute its main advantage against previous approaches and the broadly presented SPICE based simulations provide a clear proof of concept of the aforementioned appealing characteristics.
△ Less
Submitted 15 February, 2020;
originally announced February 2020.
-
Integrated magnonic half-adder
Authors:
Qi Wang,
Roman Verba,
Thomas Brächer,
Florin Ciubotaru,
Christoph Adelmann,
Sorin D. Cotofana,
Philipp Pirro,
Andrii V. Chumak
Abstract:
Spin waves and their quanta magnons open up a promising branch of high-speed and low-power information processing. Several important milestones were achieved recently in the realization of separate magnonic data processing units including logic gates, a magnon transistor and units for non-Boolean computing. Nevertheless, the realization of an integrated magnonic circuit consisting of at least two…
▽ More
Spin waves and their quanta magnons open up a promising branch of high-speed and low-power information processing. Several important milestones were achieved recently in the realization of separate magnonic data processing units including logic gates, a magnon transistor and units for non-Boolean computing. Nevertheless, the realization of an integrated magnonic circuit consisting of at least two logic gates and suitable for further integration is still an unresolved challenge. Here we demonstrate such an integrated circuit numerically on the example of a magnonic half-adder. Its key element is a nonlinear directional coupler serving as combined XOR and AND logic gate that utilizes the dependence of the spin wave dispersion on its amplitude. The circuit constitutes of only three planar nano-waveguides and processes all information within the magnon domain. Benchmarking of the proposed device is performed showing the potential for sub-aJ energy consumption per operation.
△ Less
Submitted 8 November, 2019; v1 submitted 7 February, 2019;
originally announced February 2019.
-
Compositional Memory Systems for Multimedia Communicating Tasks
Authors:
A. M. Molnos,
M. J. M. Heijligers,
S. D. Cotofana,
J. T. J. Van Eijndhoven
Abstract:
Conventional cache models are not suited for real-time parallel processing because tasks may flush each other's data out of the cache in an unpredictable manner. In this way the system is not compositional so the overall performance is difficult to predict and the integration of new tasks expensive. This paper proposes a new method that imposes compositionality to the system?s performance and ma…
▽ More
Conventional cache models are not suited for real-time parallel processing because tasks may flush each other's data out of the cache in an unpredictable manner. In this way the system is not compositional so the overall performance is difficult to predict and the integration of new tasks expensive. This paper proposes a new method that imposes compositionality to the system?s performance and makes different memory hierarchy optimizations possible for multimedia communicating tasks when running on embedded multiprocessor architectures. The method is based on a cache allocation strategy that assigns sets of the unified cache exclusively to tasks and to the communication buffers. We also analytically formulate the problem and describe a method to compute the cache partitioning ratio for optimizing the throughput and the consumed power. When applied to a multiprocessor with memory hierarchy our technique delivers also performance gain. Compared to the shared cache case, for an application consisting of two jpeg decoders and one edge detection algorithm 5 times less misses are experienced and for an mpeg2 decoder 6.5 times less misses are experienced.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.