-
Hierarchical Recording Architecture for Three-Dimensional Magnetic Recording
Authors:
Yugen Jian,
Ke Luo,
Jincai Chen,
Xuanyao Fong
Abstract:
Three-dimensional magnetic recording (3DMR) is a highly promising approach to achieving ultra-large data storage capacity in hard disk drives. One of the greatest challenges for 3DMR lies in performing sequential and correct writing of bits into the multi-layer recording medium. In this work, we have proposed a hierarchical recording architecture based on layered heat-assisted writing with a multi…
▽ More
Three-dimensional magnetic recording (3DMR) is a highly promising approach to achieving ultra-large data storage capacity in hard disk drives. One of the greatest challenges for 3DMR lies in performing sequential and correct writing of bits into the multi-layer recording medium. In this work, we have proposed a hierarchical recording architecture based on layered heat-assisted writing with a multi-head array. The feasibility of the architecture is validated in a dual-layer 3DMR system with FePt-based thin films via micromagnetic simulation. Our results reveal the magnetization reversal mechanism of the grains, ultimately attaining appreciable switching probability and medium signal-to-noise ratio (SNR) for each layer. In particular, an optimal head-to-head distance is identified as the one that maximizes the medium SNR. Optimizing the system's noise resistance will improve the overall SNR and allow for a smaller optimal head-to-head distance, which can pave the way for scaling 3DMR to more recording layers.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Analysis of Higher-Order Ising Hamiltonians
Authors:
Yunuo Cen,
Zhiwei Zhang,
Zixuan Wang,
Yimin Wang,
Xuanyao Fong
Abstract:
It is challenging to scale Ising machines for industrial-level problems due to algorithm or hardware limitations. Although higher-order Ising models provide a more compact encoding, they are, however, hard to physically implement. This work proposes a theoretical framework of a higher-order Ising simulator, IsingSim. The Ising spins and gradients in IsingSim are decoupled and self-customizable. We…
▽ More
It is challenging to scale Ising machines for industrial-level problems due to algorithm or hardware limitations. Although higher-order Ising models provide a more compact encoding, they are, however, hard to physically implement. This work proposes a theoretical framework of a higher-order Ising simulator, IsingSim. The Ising spins and gradients in IsingSim are decoupled and self-customizable. We significantly accelerate the simulation speed via a bidirectional approach for differentiating the hyperedge functions. Our proof-of-concept implementation verifies the theoretical framework by simulating the Ising spins with exact and approximate gradients. Experiment results show that our novel framework can be a useful tool for providing design guidelines for higher-order Ising machines.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
Massively Parallel Continuous Local Search for Hybrid SAT Solving on GPUs
Authors:
Yunuo Cen,
Zhiwei Zhang,
Xuanyao Fong
Abstract:
Although state-of-the-art (SOTA) SAT solvers based on conflict-driven clause learning (CDCL) have achieved remarkable engineering success, their sequential nature limits the parallelism that may be extracted for acceleration on platforms such as the graphics processing unit (GPU). In this work, we propose FastFourierSAT, a highly parallel hybrid SAT solver based on gradient-driven continuous local…
▽ More
Although state-of-the-art (SOTA) SAT solvers based on conflict-driven clause learning (CDCL) have achieved remarkable engineering success, their sequential nature limits the parallelism that may be extracted for acceleration on platforms such as the graphics processing unit (GPU). In this work, we propose FastFourierSAT, a highly parallel hybrid SAT solver based on gradient-driven continuous local search (CLS). This is realized by a novel parallel algorithm inspired by the Fast Fourier Transform (FFT)-based convolution for computing the elementary symmetric polynomials (ESPs), which is the major computational task in previous CLS methods. The complexity of our algorithm matches the best previous result. Furthermore, the substantial parallelism inherent in our algorithm can leverage the GPU for acceleration, demonstrating significant improvement over the previous CLS approaches. We also propose to incorporate the restart heuristics in CLS to improve search efficiency. We compare our approach with the SOTA parallel SAT solvers on several benchmarks. Our results show that FastFourierSAT computes the gradient 100+ times faster than previous prototypes implemented on CPU. Moreover, FastFourierSAT solves most instances and demonstrates promising performance on larger-size instances.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Energy-efficient superparamagnetic Ising machine and its application to traveling salesman problems
Authors:
Jia Si,
Shuhan Yang,
Yunuo Cen,
Jiaer Chen,
Zhaoyang Yao,
Dong-Jun Kim,
Kaiming Cai,
Jerald Yoo,
Xuanyao Fong,
Hyunsoo Yang
Abstract:
The growth of artificial intelligence and IoT has created a significant computational load for solving non-deterministic polynomial-time (NP)-hard problems, which are difficult to solve using conventional computers. The Ising computer, based on the Ising model and annealing process, has been highly sought for finding approximate solutions to NP-hard problems by observing the convergence of dynamic…
▽ More
The growth of artificial intelligence and IoT has created a significant computational load for solving non-deterministic polynomial-time (NP)-hard problems, which are difficult to solve using conventional computers. The Ising computer, based on the Ising model and annealing process, has been highly sought for finding approximate solutions to NP-hard problems by observing the convergence of dynamic spin states. However, it faces several challenges, including high power consumption due to artificial spins and randomness emulated by complex circuits, as well as low scalability caused by the rapidly growing connectivity when considering large-scale problems. Here, we present an experimental Ising annealing computer based on superparamagnetic tunnel junctions (SMTJs) with all-to-all connections, which successfully solves a 70-city travelling salesman problem (4761-node Ising problem). By taking advantage of the intrinsic randomness of SMTJs, implementing a proper global annealing scheme, and using an efficient algorithm, our SMTJ-based Ising annealer shows superior performance in terms of power consumption and energy efficiency compared to other Ising schemes. Additionally, our approach provides a promising way to solve complex problems with limited hardware resources. Moreover, we propose a crossbar array architecture for scalable integration using conventional magnetic random access memories. Our results demonstrate that the SMTJ-based Ising annealing computer with high energy efficiency, speed, and scalability is a strong candidate for future unconventional computing schemes.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Electrical Tunable Spintronic Neuron with Trainable Activation Function
Authors:
Yue Xin,
Kang Zhou,
Xuanyao Fong,
Yumeng Yang,
Shenghua Gao,
Zhifeng Zhu
Abstract:
Spintronic devices have been widely studied for the hardware realization of artificial neurons. The stochastic switching of magnetic tunnel junction driven by the spin torque is commonly used to produce the sigmoid activation function. However, the shape of the activation function in previous studies is fixed during the training of neural network. This restricts the updating of weights and results…
▽ More
Spintronic devices have been widely studied for the hardware realization of artificial neurons. The stochastic switching of magnetic tunnel junction driven by the spin torque is commonly used to produce the sigmoid activation function. However, the shape of the activation function in previous studies is fixed during the training of neural network. This restricts the updating of weights and results in a limited performance. In this work, we exploit the physics behind the spin torque induced magnetization switching to enable the dynamic change of the activation function during the training process. Specifically, the pulse width and magnetic anisotropy can be electrically controlled to change the slope of activation function, which enables a faster or slower change of output required by the backpropagation algorithm. This is also similar to the idea of batch normalization that is widely used in the machine learning. Thus, this work demonstrates that the algorithms are no longer limited to the software implementation. They can in fact be realized by the spintronic hardware using a single device. Finally, we show that the accuracy of hand-written digit recognition can be improved from 88% to 91.3% by using these trainable spintronic neurons without introducing additional energy consumption. Our proposals can stimulate the hardware realization of spintronic neural networks.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
CITS: Coherent Ising Tree Search Algorithm Towards Solving Combinatorial Optimization Problems
Authors:
Yunuo Cen,
Debasis Das,
Xuanyao Fong
Abstract:
Simulated annealing (SA) attracts more attention among classical heuristic algorithms because the solution of the combinatorial optimization problem can be naturally mapped to the ground state of the Ising Hamiltonian. However, in practical implementation, the annealing process cannot be arbitrarily slow and hence, it may deviate from the expected stationary Boltzmann distribution and become trapp…
▽ More
Simulated annealing (SA) attracts more attention among classical heuristic algorithms because the solution of the combinatorial optimization problem can be naturally mapped to the ground state of the Ising Hamiltonian. However, in practical implementation, the annealing process cannot be arbitrarily slow and hence, it may deviate from the expected stationary Boltzmann distribution and become trapped in a local energy minimum. To overcome this problem, this paper proposes a heuristic search algorithm by expanding search space from a Markov chain to a recursive depth limited tree based on SA, where the parent and child nodes represent the current and future spin states. At each iteration, the algorithm will select the best near-optimal solution within the feasible search space by exploring along the tree in the sense of `look ahead'. Furthermore, motivated by coherent Ising machine (CIM), we relax the discrete representation of spin states to continuous representation with a regularization term and utilize the reduced dynamics of the oscillators to explore the surrounding neighborhood of the selected tree nodes. We tested our algorithm on a representative NP-hard problem (MAX-CUT) to illustrate the effectiveness of this algorithm compared to semi-definite programming (SDP), SA, and simulated CIM. Our results show that above the primal heuristics SA and CIM, our high-level tree search strategy is able to provide solutions within fewer epochs for Ising formulated NP-optimization problems.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Connection Pruning for Deep Spiking Neural Networks with On-Chip Learning
Authors:
Thao N. N. Nguyen,
Bharadwaj Veeravalli,
Xuanyao Fong
Abstract:
Long training time hinders the potential of the deep, large-scale Spiking Neural Network (SNN) with the on-chip learning capability to be realized on the embedded systems hardware. Our work proposes a novel connection pruning approach that can be applied during the on-chip Spike Timing Dependent Plasticity (STDP)-based learning to optimize the learning time and the network connectivity of the deep…
▽ More
Long training time hinders the potential of the deep, large-scale Spiking Neural Network (SNN) with the on-chip learning capability to be realized on the embedded systems hardware. Our work proposes a novel connection pruning approach that can be applied during the on-chip Spike Timing Dependent Plasticity (STDP)-based learning to optimize the learning time and the network connectivity of the deep SNN. We applied our approach to a deep SNN with the Time To First Spike (TTFS) coding and has successfully achieved 2.1x speed-up and 64% energy savings in the on-chip learning and reduced the network connectivity by 92.83%, without incurring any accuracy loss. Moreover, the connectivity reduction results in 2.83x speed-up and 78.24% energy savings in the inference. Evaluation of our proposed approach on the Field Programmable Gate Array (FPGA) platform revealed 0.56% power overhead was needed to implement the pruning algorithm.
△ Less
Submitted 31 July, 2021; v1 submitted 8 October, 2020;
originally announced October 2020.
-
SIMBA: A Skyrmionic In-Memory Binary Neural Network Accelerator
Authors:
Venkata Pavan Kumar Miriyala,
Kale Rahul Vishwanath,
Xuanyao Fong
Abstract:
Magnetic skyrmions are emerging as potential candidates for next generation non-volatile memories. In this paper, we propose an in-memory binary neural network (BNN) accelerator based on the non-volatile skyrmionic memory, which we call as SIMBA. SIMBA consumes 26.7 mJ of energy and 2.7 ms of latency when running an inference on a VGG-like BNN. Furthermore, we demonstrate improvements in the perfo…
▽ More
Magnetic skyrmions are emerging as potential candidates for next generation non-volatile memories. In this paper, we propose an in-memory binary neural network (BNN) accelerator based on the non-volatile skyrmionic memory, which we call as SIMBA. SIMBA consumes 26.7 mJ of energy and 2.7 ms of latency when running an inference on a VGG-like BNN. Furthermore, we demonstrate improvements in the performance of SIMBA by optimizing material parameters such as saturation magnetization, anisotropic energy and damping ratio. Finally, we show that the inference accuracy of BNNs is robust against the possible stochastic behavior of SIMBA (88.5% +/- 1%).
△ Less
Submitted 11 March, 2020;
originally announced March 2020.
-
Yield, Area and Energy Optimization in Stt-MRAMs using failure aware ECC
Authors:
Zoha Pajouhi,
Xuanyao Fong,
Anand Raghunathan,
Kaushik Roy
Abstract:
Spin Transfer Torque MRAMs are attractive due to their non-volatility, high density and zero leakage. However, STT-MRAMs suffer from poor reliability due to shared read and write paths. Additionally, conflicting requirements for data retention and write-ability (both related to the energy barrier height of the magnet) makes design more challenging. Furthermore, the energy barrier height depends on…
▽ More
Spin Transfer Torque MRAMs are attractive due to their non-volatility, high density and zero leakage. However, STT-MRAMs suffer from poor reliability due to shared read and write paths. Additionally, conflicting requirements for data retention and write-ability (both related to the energy barrier height of the magnet) makes design more challenging. Furthermore, the energy barrier height depends on the physical dimensions of the free layer. Any variations in the dimensions of the free layer lead to variations in the energy barrier height. In order to address poor reliability of STT-MRAMs, usage of Error Correcting Codes (ECC) have been proposed. Unlike traditional CMOS memory technologies, ECC is expected to correct both soft and hard errors in STT_MRAMs. To achieve acceptable yield with low write power, stronger ECC is required, resulting in increased number of encoded bits and degraded memory efficiency. In this paper, we propose Failure aware ECC (FaECC), which masks permanent faults while maintaining the same correction capability for soft errors without increased encoded bits. Furthermore, we investigate the impact of process variations on run-time reliability of STT-MRAMs. We provide an analysis on the impact of process variations on the life-time of the free layer and retention failures. In order to analyze the effectiveness of our methodology, we developed a cross-layer simulation framework that consists of device, circuit and array level analysis of STT-MRAM memory arrays. Our results show that using FaECC relaxes the requirements on the energy barrier height, which reduces the write energy and results in smaller access transistor size and memory array area. Keywords: STT-MRAM, reliability, Error Correcting Codes, ECC, magnetic memory
△ Less
Submitted 16 June, 2016; v1 submitted 28 September, 2015;
originally announced September 2015.
-
Spin-Orbit Torque Induced Spike-Timing Dependent Plasticity
Authors:
Abhronil Sengupta,
Zubair Al Azim,
Xuanyao Fong,
Kaushik Roy
Abstract:
Nanoelectronic devices that mimic the functionality of synapses are a crucial requirement for performing cortical simulations of the brain. In this work we propose a ferromagnet-heavy metal heterostructure that employs spin-orbit torque to implement Spike-Timing Dependent Plasticity. The proposed device offers the advantage of decoupled spike transmission and programming current paths, thereby lea…
▽ More
Nanoelectronic devices that mimic the functionality of synapses are a crucial requirement for performing cortical simulations of the brain. In this work we propose a ferromagnet-heavy metal heterostructure that employs spin-orbit torque to implement Spike-Timing Dependent Plasticity. The proposed device offers the advantage of decoupled spike transmission and programming current paths, thereby leading to reliable operation during online learning. Possible arrangement of such devices in a crosspoint architecture can pave the way for ultra-dense neural networks. Simulation studies indicate that the device has the potential of achieving pico-Joule level energy consumption (maximum 2 pJ per synaptic event) which is comparable to the energy consumption for synaptic events in biological synapses.
△ Less
Submitted 19 December, 2014;
originally announced December 2014.
-
Laser Induced Magnetization Reversal for Detection in Optical Interconnects
Authors:
Zubair Al Azim,
Xuanyao Fong,
Thomas Ostler,
Roy Chantrell,
Kaushik Roy
Abstract:
Optical interconnect has emerged as the front-runner to replace electrical interconnect especially for off-chip communication. However, a major drawback with optical interconnects is the need for photodetectors and amplifiers at the receiver, implemented usually by direct bandgap semiconductors and analog CMOS circuits, leading to large energy consumption and slow operating time. In this article,…
▽ More
Optical interconnect has emerged as the front-runner to replace electrical interconnect especially for off-chip communication. However, a major drawback with optical interconnects is the need for photodetectors and amplifiers at the receiver, implemented usually by direct bandgap semiconductors and analog CMOS circuits, leading to large energy consumption and slow operating time. In this article, we propose a new optical interconnect architecture that uses a magnetic tunnel junction (MTJ) at the receiver side that is switched by femtosecond laser pulses. The state of the MTJ can be sensed using simple digital CMOS latches, resulting in significant improvement in energy consumption. Moreover, magnetization in the MTJ can be switched on the picoseconds time-scale and our design can operate at a speed of 5 Gbits/sec for a single link.
△ Less
Submitted 9 October, 2014;
originally announced October 2014.