-
Enhancing Granular Sentiment Classification with Chain-of-Thought Prompting in Large Language Models
Authors:
Vihaan Miriyala,
Smrithi Bukkapatnam,
Lavanya Prahallad
Abstract:
We explore the use of Chain-of-Thought (CoT) prompting with large language models (LLMs) to improve the accuracy of granular sentiment categorization in app store reviews. Traditional numeric and polarity-based ratings often fail to capture the nuanced sentiment embedded in user feedback. We evaluated the effectiveness of CoT prompting versus simple prompting on 2000 Amazon app reviews by comparin…
▽ More
We explore the use of Chain-of-Thought (CoT) prompting with large language models (LLMs) to improve the accuracy of granular sentiment categorization in app store reviews. Traditional numeric and polarity-based ratings often fail to capture the nuanced sentiment embedded in user feedback. We evaluated the effectiveness of CoT prompting versus simple prompting on 2000 Amazon app reviews by comparing each method's predictions to human judgements. CoT prompting improved classification accuracy from 84% to 93% highlighting the benefit of explicit reasoning in enhancing sentiment analysis performance.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Data-aware Dynamic Execution of Irregular Workloads on Heterogeneous Systems
Authors:
Zhenyu Bai,
Dan Wu,
Pranav Dangi,
Dhananjaya Wijerathne,
Venkata Pavan Kumar Miriyala,
Tulika Mitra
Abstract:
Current approaches to scheduling workloads on heterogeneous systems with specialized accelerators often rely on manual partitioning, offloading tasks with specific compute patterns to accelerators. This method requires extensive experimentation and human effort to identify the tasks suitable for the accelerator. To solve this problem, we introduce DyPe, a scheduling framework tailored for heteroge…
▽ More
Current approaches to scheduling workloads on heterogeneous systems with specialized accelerators often rely on manual partitioning, offloading tasks with specific compute patterns to accelerators. This method requires extensive experimentation and human effort to identify the tasks suitable for the accelerator. To solve this problem, we introduce DyPe, a scheduling framework tailored for heterogeneous systems with specialized accelerators. Our method automatically partitions, deploys, and reschedules execution when necessary by dynamically analyzing the characteristics of the input data and leveraging the interoperator parallelism among heterogeneous devices.
DyPe navigates a multi-objective, multi-constraint design space that considers both system constraints and application requirements, which allows it to discover Pareto-optimal mapping configurations, improving the system's overall performance and effectively managing energy-performance trade-offs. To demonstrate the benefits of our approach on real hardware, we build a heterogeneous system of GPUs and FPGAs with peer-to-peer data transfers. The experiments show that conventional static scheduling is optimal for 13 out of 86 cases for different workloads and system settings while DyPe is adaptable and able to find the optimal schedule in 77 out of 86 cases, with an average of only 3.95% performance or energy efficiency loss in the sub-optimal cases. Performance evaluation of DyPe shows an average of 1.53x throughput and 1.09x energy efficiency improvement over the static schedule baseline and 1.44x throughput and 1.66x energy efficiency over the GPU-only baseline.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Multiply-and-Fire (MNF): An Event-driven Sparse Neural Network Accelerator
Authors:
Miao Yu,
Tingting Xiang,
Venkata Pavan Kumar Miriyala,
Trevor E. Carlson
Abstract:
Machine learning, particularly deep neural network inference, has become a vital workload for many computing systems, from data centers and HPC systems to edge-based computing. As advances in sparsity have helped improve the efficiency of AI acceleration, there is a continued need for improved system efficiency for both high-performance and system-level acceleration.
This work takes a unique loo…
▽ More
Machine learning, particularly deep neural network inference, has become a vital workload for many computing systems, from data centers and HPC systems to edge-based computing. As advances in sparsity have helped improve the efficiency of AI acceleration, there is a continued need for improved system efficiency for both high-performance and system-level acceleration.
This work takes a unique look at sparsity with an event (or activation-driven) approach to ANN acceleration that aims to minimize useless work, improve utilization, and increase performance and energy efficiency. Our analytical and experimental results show that this event-driven solution presents a new direction to enable highly efficient AI inference for both CNN and MLP workloads.
This work demonstrates state-of-the-art energy efficiency and performance centring on activation-based sparsity and a highly-parallel dataflow method that improves the overall functional unit utilization (at 30 fps). This work enhances energy efficiency over a state-of-the-art solution by 1.46$\times$. Taken together, this methodology presents a novel, new direction to achieve high-efficiency, high-performance designs for next-generation AI acceleration platforms.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Ultra-low power on-chip learning of speech commands with phase-change memories
Authors:
Venkata Pavan Kumar Miriyala,
Masatoshi Ishii
Abstract:
Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. As edge devices typically spend most of their time in sleep mode and only wake-up infrequently to collect and process sensor data, non-volatile in-memory computing (NVIMC) is a promising approach to design the next generation of edge-AI…
▽ More
Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. As edge devices typically spend most of their time in sleep mode and only wake-up infrequently to collect and process sensor data, non-volatile in-memory computing (NVIMC) is a promising approach to design the next generation of edge-AI devices. Recently, we proposed an NVIMC-based neuromorphic accelerator using the phase change memories (PCMs), which we call as Raven. In this work, we demonstrate the ultra-low-power on-chip training and inference of speech commands using Raven. We showed that Raven can be trained on-chip with power consumption as low as 30~uW, which is suitable for edge applications. Furthermore, we showed that at iso-accuracies, Raven needs 70.36x and 269.23x less number of computations to be performed than a deep neural network (DNN) during inference and training, respectively. Owing to such low power and computational requirements, Raven provides a promising pathway towards ultra-low-power training and inference at the edge.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy
Authors:
Srivatsa P,
Kyle Timothy Ng Chu,
Burin Amornpaisannon,
Yaswanth Tavva,
Venkata Pavan Kumar Miriyala,
Jibin Wu,
Malu Zhang,
Haizhou Li,
Trevor E. Carlson
Abstract:
In the past decade, advances in Artificial Neural Networks (ANNs) have allowed them to perform extremely well for a wide range of tasks. In fact, they have reached human parity when performing image recognition, for example. Unfortunately, the accuracy of these ANNs comes at the expense of a large number of cache and/or memory accesses and compute operations. Spiking Neural Networks (SNNs), a type…
▽ More
In the past decade, advances in Artificial Neural Networks (ANNs) have allowed them to perform extremely well for a wide range of tasks. In fact, they have reached human parity when performing image recognition, for example. Unfortunately, the accuracy of these ANNs comes at the expense of a large number of cache and/or memory accesses and compute operations. Spiking Neural Networks (SNNs), a type of neuromorphic, or brain-inspired network, have recently gained significant interest as power-efficient alternatives to ANNs, because they are sparse, accessing very few weights, and typically only use addition operations instead of the more power-intensive multiply-and-accumulate (MAC) operations. The vast majority of neuromorphic hardware designs support rate-encoded SNNs, where the information is encoded in spike rates. Rate-encoded SNNs could be seen as inefficient as an encoding scheme because it involves the transmission of a large number of spikes. A more efficient encoding scheme, Time-To-First-Spike (TTFS) encoding, encodes information in the relative time of arrival of spikes. While TTFS-encoded SNNs are more efficient than rate-encoded SNNs, they have, up to now, performed poorly in terms of accuracy compared to previous methods. Hence, in this work, we aim to overcome the limitations of TTFS-encoded neuromorphic systems. To accomplish this, we propose: (1) a novel optimization algorithm for TTFS-encoded SNNs converted from ANNs and (2) a novel hardware accelerator for TTFS-encoded SNNs, with a scalable and low-power design. Overall, our work in TTFS encoding and training improves the accuracy of SNNs to achieve state-of-the-art results on MNIST MLPs, while reducing power consumption by 1.46$\times$ over the state-of-the-art neuromorphic hardware.
△ Less
Submitted 8 November, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Rectified Linear Postsynaptic Potential Function for Backpropagation in Deep Spiking Neural Networks
Authors:
Malu Zhang,
Jiadong Wang,
Burin Amornpaisannon,
Zhixuan Zhang,
VPK Miriyala,
Ammar Belatreche,
Hong Qu,
Jibin Wu,
Yansong Chua,
Trevor E. Carlson,
Haizhou Li
Abstract:
Spiking Neural Networks (SNNs) use spatio-temporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation. Motivated by the success of deep learning, the study of Deep Spiking Neural Networks (DeepSNNs) provides promising directions for artificial intelligence applications. Howeve…
▽ More
Spiking Neural Networks (SNNs) use spatio-temporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation. Motivated by the success of deep learning, the study of Deep Spiking Neural Networks (DeepSNNs) provides promising directions for artificial intelligence applications. However, training of DeepSNNs is not straightforward because the well-studied error back-propagation (BP) algorithm is not directly applicable. In this paper, we first establish an understanding as to why error back-propagation does not work well in DeepSNNs. To address this problem, we propose a simple yet efficient Rectified Linear Postsynaptic Potential function (ReL-PSP) for spiking neurons and propose a Spike-Timing-Dependent Back-Propagation (STDBP) learning algorithm for DeepSNNs. In STDBP algorithm, the timing of individual spikes is used to convey information (temporal coding), and learning (back-propagation) is performed based on spike timing in an event-driven manner. Our experimental results show that the proposed learning algorithm achieves state-of-the-art classification accuracy in single spike time based learning algorithms of DeepSNNs. Furthermore, by utilizing the trained model parameters obtained from the proposed STDBP learning algorithm, we demonstrate the ultra-low-power inference operations on a recently proposed neuromorphic inference accelerator. Experimental results show that the neuromorphic hardware consumes 0.751~mW of the total power consumption and achieves a low latency of 47.71~ms to classify an image from the MNIST dataset. Overall, this work investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
△ Less
Submitted 3 November, 2020; v1 submitted 26 March, 2020;
originally announced March 2020.
-
SIMBA: A Skyrmionic In-Memory Binary Neural Network Accelerator
Authors:
Venkata Pavan Kumar Miriyala,
Kale Rahul Vishwanath,
Xuanyao Fong
Abstract:
Magnetic skyrmions are emerging as potential candidates for next generation non-volatile memories. In this paper, we propose an in-memory binary neural network (BNN) accelerator based on the non-volatile skyrmionic memory, which we call as SIMBA. SIMBA consumes 26.7 mJ of energy and 2.7 ms of latency when running an inference on a VGG-like BNN. Furthermore, we demonstrate improvements in the perfo…
▽ More
Magnetic skyrmions are emerging as potential candidates for next generation non-volatile memories. In this paper, we propose an in-memory binary neural network (BNN) accelerator based on the non-volatile skyrmionic memory, which we call as SIMBA. SIMBA consumes 26.7 mJ of energy and 2.7 ms of latency when running an inference on a VGG-like BNN. Furthermore, we demonstrate improvements in the performance of SIMBA by optimizing material parameters such as saturation magnetization, anisotropic energy and damping ratio. Finally, we show that the inference accuracy of BNNs is robust against the possible stochastic behavior of SIMBA (88.5% +/- 1%).
△ Less
Submitted 11 March, 2020;
originally announced March 2020.