-
Robotics Under Construction: Challenges on Job Sites
Authors:
Haruki Uchiito,
Akhilesh Bhat,
Koji Kusaka,
Xiaoya Zhang,
Hiraku Kinjo,
Honoka Uehara,
Motoki Koyama,
Shinji Natsume
Abstract:
As labor shortages and productivity stagnation increasingly challenge the construction industry, automation has become essential for sustainable infrastructure development. This paper presents an autonomous payload transportation system as an initial step toward fully unmanned construction sites. Our system, based on the CD110R-3 crawler carrier, integrates autonomous navigation, fleet management,…
▽ More
As labor shortages and productivity stagnation increasingly challenge the construction industry, automation has become essential for sustainable infrastructure development. This paper presents an autonomous payload transportation system as an initial step toward fully unmanned construction sites. Our system, based on the CD110R-3 crawler carrier, integrates autonomous navigation, fleet management, and GNSS-based localization to facilitate material transport in construction site environments. While the current system does not yet incorporate dynamic environment adaptation algorithms, we have begun fundamental investigations into external-sensor based perception and mapping system. Preliminary results highlight the potential challenges, including navigation in evolving terrain, environmental perception under construction-specific conditions, and sensor placement optimization for improving autonomy and efficiency. Looking forward, we envision a construction ecosystem where collaborative autonomous agents dynamically adapt to site conditions, optimizing workflow and reducing human intervention. This paper provides foundational insights into the future of robotics-driven construction automation and identifies critical areas for further technological development.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Second Order State Hallucinations for Adversarial Attack Mitigation in Formation Control of Multi-Agent Systems
Authors:
Laksh Patel,
Akhilesh Raj
Abstract:
The increasing deployment of multi-agent systems (MAS) in critical infrastructures such as autonomous transportation, disaster relief, and smart cities demands robust formation control mechanisms resilient to adversarial attacks. Traditional consensus-based controllers, while effective under nominal conditions, are highly vulnerable to data manipulation, sensor spoofing, and communication failures…
▽ More
The increasing deployment of multi-agent systems (MAS) in critical infrastructures such as autonomous transportation, disaster relief, and smart cities demands robust formation control mechanisms resilient to adversarial attacks. Traditional consensus-based controllers, while effective under nominal conditions, are highly vulnerable to data manipulation, sensor spoofing, and communication failures. To address this challenge, we propose Second-Order State Hallucination (SOSH), a novel framework that detects compromised agents through distributed residual monitoring and maintains formation stability by replacing attacked states with predictive second-order approximations. Unlike existing mitigation strategies that require significant restructuring or induce long transients, SOSH offers a lightweight, decentralized correction mechanism based on second-order Taylor expansions, enabling rapid and scalable resilience. We establish rigorous Lyapunov-based stability guarantees, proving that formation errors remain exponentially bounded even under persistent attacks, provided the hallucination parameters satisfy explicit conditions. Comprehensive Monte Carlo experiments on a 5-agent complete graph formation demonstrate that SOSH outperforms established robust control schemes, including W-MSR and Huber-based consensus filters, achieving faster convergence rates, lower steady-state error, and superior transient recovery. Our results confirm that SOSH combines theoretical robustness with practical deployability, offering a promising direction for securing MAS formations against sophisticated adversarial threats.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Predictive Performance of Photonic SRAM-based In-Memory Computing for Tensor Decomposition
Authors:
Sasindu Wijeratne,
Sugeet Sunder,
Md Abdullah-Al Kaiser,
Akhilesh Jaiswal,
Clynn Mathew,
Ajey P. Jacob,
Viktor Prasanna
Abstract:
Photonics-based in-memory computing systems have demonstrated a significant speedup over traditional transistor-based systems because of their ultra-fast operating frequencies and high data bandwidths. Photonic static random access memory (pSRAM) is a crucial component for achieving the objective of ultra-fast photonic in-memory computing systems. In this work, we model and evaluate the performanc…
▽ More
Photonics-based in-memory computing systems have demonstrated a significant speedup over traditional transistor-based systems because of their ultra-fast operating frequencies and high data bandwidths. Photonic static random access memory (pSRAM) is a crucial component for achieving the objective of ultra-fast photonic in-memory computing systems. In this work, we model and evaluate the performance of a novel photonic SRAM array architecture in development. Additionally, we examine hyperspectral operation through wavelength division multiplexing (WDM) to enhance the throughput of the pSRAM array. We map Matricized Tensor Times Khatri-Rao Product (MTTKRP), a computational kernel commonly used in tensor decomposition, to the proposed pSRAM array architecture. We also develop a predictive performance model to estimate the sustained performance of different configurations of the pSRAM array. Using the predictive performance model, we demonstrate that the pSRAM array achieves 17 PetaOps while performing MTTKRP in a practical hardware configuration.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
Connecting the Persian-speaking World through Transliteration
Authors:
Rayyan Merchant,
Akhilesh Kakolu Ramarao,
Kevin Tang
Abstract:
Despite speaking mutually intelligible varieties of the same language, speakers of Tajik Persian, written in a modified Cyrillic alphabet, cannot read Iranian and Afghan texts written in the Perso-Arabic script. As the vast majority of Persian text on the Internet is written in Perso-Arabic, monolingual Tajik speakers are unable to interface with the Internet in any meaningful way. Due to overwhel…
▽ More
Despite speaking mutually intelligible varieties of the same language, speakers of Tajik Persian, written in a modified Cyrillic alphabet, cannot read Iranian and Afghan texts written in the Perso-Arabic script. As the vast majority of Persian text on the Internet is written in Perso-Arabic, monolingual Tajik speakers are unable to interface with the Internet in any meaningful way. Due to overwhelming similarity between the formal registers of these dialects and the scarcity of Tajik-Farsi parallel data, machine transliteration has been proposed as more a practical and appropriate solution than machine translation. This paper presents a transformer-based G2P approach to Tajik-Farsi transliteration, achieving chrF++ scores of 58.70 (Farsi to Tajik) and 74.20 (Tajik to Farsi) on novel digraphic datasets, setting a comparable baseline metric for future work. Our results also demonstrate the non-trivial difficulty of this task in both directions. We also provide an overview of the differences between the two scripts and the challenges they present, so as to aid future efforts in Tajik-Farsi transliteration.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Image Classification Method using Dynamic Quantum Inspired Genetic Algorithm
Authors:
Akhilesh Kumar Singh,
Kirankumar R. Hiremath
Abstract:
This study presents a dynamic Quantum-Inspired Genetic Algorithm (D-QIGA) for feature selection, leveraging quantum principles like superposition and rotation gates to enhance exploration and exploitation. D-QIGA introduces adaptive mechanisms and a lengthening chromosome strategy to avoid local optima and improve optimization. Tested on benchmark and real-world problems, it significantly outperfo…
▽ More
This study presents a dynamic Quantum-Inspired Genetic Algorithm (D-QIGA) for feature selection, leveraging quantum principles like superposition and rotation gates to enhance exploration and exploitation. D-QIGA introduces adaptive mechanisms and a lengthening chromosome strategy to avoid local optima and improve optimization. Tested on benchmark and real-world problems, it significantly outperforms traditional Genetic Algorithms, achieving over 99.99% classification accuracy compared to GA's 95%.
△ Less
Submitted 4 April, 2025; v1 submitted 20 January, 2025;
originally announced January 2025.
-
Real-Time Computational Visual Aberration Correcting Display Through High-Contrast Inverse Blurring
Authors:
Akhilesh Balaji,
Dhruv Ramu
Abstract:
This paper presents a framework for developing a live vision-correcting display (VCD) to address refractive visual aberrations without the need for traditional vision correction devices like glasses or contact lenses, particularly in scenarios where wearing them may be inconvenient. We achieve this correction through deconvolution of the displayed image using a point spread function (PSF) associat…
▽ More
This paper presents a framework for developing a live vision-correcting display (VCD) to address refractive visual aberrations without the need for traditional vision correction devices like glasses or contact lenses, particularly in scenarios where wearing them may be inconvenient. We achieve this correction through deconvolution of the displayed image using a point spread function (PSF) associated with the viewer's eye. We address ringing artefacts using a masking technique applied to the prefiltered image. We also enhance the display's contrast and reduce color distortion by operating in the YUV/YCbCr color space, where deconvolution is performed solely on the luma (brightness) channel. Finally, we introduce a technique to calculate a real-time PSF that adapts based on the viewer's spherical coordinates relative to the screen. This ensures that the PSF remains accurate and undistorted even when the viewer observes the display from an angle relative to the screen normal, thereby providing consistent visual correction regardless of the viewing angle. The results of our display demonstrate significant improvements in visual clarity, achieving a structural similarity index (SSIM) of 83.04%, highlighting the effectiveness of our approach.
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
Authors:
Yun Peng,
Akhilesh Deepak Gotmare,
Michael Lyu,
Caiming Xiong,
Silvio Savarese,
Doyen Sahoo
Abstract:
Large Language Models (LLMs) are widely adopted for assisting in software development tasks, yet their performance evaluations have narrowly focused on the functional correctness of generated code. Human programmers, however, require LLM-generated code to be not only correct but also optimally efficient. We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generat…
▽ More
Large Language Models (LLMs) are widely adopted for assisting in software development tasks, yet their performance evaluations have narrowly focused on the functional correctness of generated code. Human programmers, however, require LLM-generated code to be not only correct but also optimally efficient. We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code by incorporating feedback based on runtime during test case execution into the self-refinement iterations. With PerfCodeGen, we achieve speedups for a significantly higher proportion of problems compared to using the base LLM with sophisticated prompting techniques. Applied to open language models like Phi-3-mini, PerfCodeGen achieves runtime efficiency comparable to prompting powerful closed models like GPT-4. We achieve state-of-the-art runtime efficiency on benchmarks such as HumanEval, MBPP, and APPS, frequently surpassing the ground truth reference solutions with PerfCodeGen using GPT-3.5 and GPT-4. Additionally, we demonstrate the effectiveness of our approach in enhancing code quality across a range of open LLMs of varying sizes including Phi-3-mini, Llama 3 8B, Mixtral 8x7B, Command R, and Llama 3 70B.
△ Less
Submitted 18 November, 2024;
originally announced December 2024.
-
Frequency matters: Modeling irregular morphological patterns in Spanish with Transformers
Authors:
Akhilesh Kakolu Ramarao,
Kevin Tang,
Dinah Baer-Henney
Abstract:
Over the past decade, various studies have addressed how speakers solve the so-called `The Paradigm Cell Filling Problem' (PCFP) \citep{ackerman2009parts} across different languages. The PCFP addresses a fundamental question in morphological processing: how do speakers accurately generate inflected forms of words when presented with incomplete paradigms? This problem is particularly salient when m…
▽ More
Over the past decade, various studies have addressed how speakers solve the so-called `The Paradigm Cell Filling Problem' (PCFP) \citep{ackerman2009parts} across different languages. The PCFP addresses a fundamental question in morphological processing: how do speakers accurately generate inflected forms of words when presented with incomplete paradigms? This problem is particularly salient when modeling complex inflectional systems. We focus on Spanish verbal paradigms, where certain verbs follow an irregular L-shaped pattern, where the first-person singular present indicative stem matches the stem used throughout the present subjunctive mood. We formulate the problem as a morphological reinflection task. Specifically, we investigate the role of input frequency in the acquisition of regular versus irregular L-shaped patterns in transformer models. By systematically manipulating the input distributions and analyzing model behavior, we reveal four key findings: 1) Models perform better on L-shaped verbs compared to regular verbs, especially in uneven frequency conditions; 2) Robust primacy effects are observed, but no consistent recency effects; 3) Memorization becomes more prominent as the proportion of L-shaped verbs increases; 4) There is a tendency to regularize L-shaped verbs when their consonant alternation pairs are rare or absent in the training data.
△ Less
Submitted 27 May, 2025; v1 submitted 28 October, 2024;
originally announced October 2024.
-
Voltage-Controlled Magnetic Tunnel Junction based ADC-less Global Shutter Processing-in-Pixel for Extreme-Edge Intelligence
Authors:
Md Abdullah-Al Kaiser,
Gourav Datta,
Jordan Athas,
Christian Duffee,
Ajey P. Jacob,
Pedram Khalili Amiri,
Peter A. Beerel,
Akhilesh R. Jaiswal
Abstract:
The vast amount of data generated by camera sensors has prompted the exploration of energy-efficient processing solutions for deploying computer vision tasks on edge devices. Among the various approaches studied, processing-in-pixel integrates massively parallel analog computational capabilities at the extreme-edge, i.e., within the pixel array and exhibits enhanced energy and bandwidth efficiency…
▽ More
The vast amount of data generated by camera sensors has prompted the exploration of energy-efficient processing solutions for deploying computer vision tasks on edge devices. Among the various approaches studied, processing-in-pixel integrates massively parallel analog computational capabilities at the extreme-edge, i.e., within the pixel array and exhibits enhanced energy and bandwidth efficiency by generating the output activations of the first neural network layer rather than the raw sensory data. In this article, we propose an energy and bandwidth efficient ADC-less processing-in-pixel architecture. This architecture implements an optimized binary activation neural network trained using Hoyer regularizer for high accuracy on complex vision tasks. In addition, we also introduce a global shutter burst memory read scheme utilizing fast and disturb-free read operation leveraging innovative use of nanoscale voltage-controlled magnetic tunnel junctions (VC-MTJs). Moreover, we develop an algorithmic framework incorporating device and circuit constraints (characteristic device switching behavior and circuit non-linearity) based on state-of-the-art fabricated VC-MTJ characteristics and extensive circuit simulations using commercial GlobalFoundries 22nm FDX technology. Finally, we evaluate the proposed system's performance on two complex datasets - CIFAR10 and ImageNet, showing improvements in front-end and communication energy efficiency by 8.2x and 8.5x respectively and reduction in bandwidth by 6x compared to traditional computer vision systems, without any significant drop in the test accuracy.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages?
Authors:
Akhilesh Aravapalli,
Mounika Marreddy,
Subba Reddy Oota,
Radhika Mamidi,
Manish Gupta
Abstract:
Transformer-based models have revolutionized the field of natural language processing. To understand why they perform so well and to assess their reliability, several studies have focused on questions such as: Which linguistic properties are encoded by these models, and to what extent? How robust are these models in encoding linguistic properties when faced with perturbations in the input text? Ho…
▽ More
Transformer-based models have revolutionized the field of natural language processing. To understand why they perform so well and to assess their reliability, several studies have focused on questions such as: Which linguistic properties are encoded by these models, and to what extent? How robust are these models in encoding linguistic properties when faced with perturbations in the input text? However, these studies have mainly focused on BERT and the English language. In this paper, we investigate similar questions regarding encoding capability and robustness for 8 linguistic properties across 13 different perturbations in 6 Indic languages, using 9 multilingual Transformer models (7 universal and 2 Indic-specific). To conduct this study, we introduce a novel multilingual benchmark dataset, IndicSentEval, containing approximately $\sim$47K sentences. Surprisingly, our probing analysis of surface, syntactic, and semantic properties reveals that while almost all multilingual models demonstrate consistent encoding performance for English, they show mixed results for Indic languages. As expected, Indic-specific multilingual models capture linguistic properties in Indic languages better than universal models. Intriguingly, universal models broadly exhibit better robustness compared to Indic-specific models, particularly under perturbations such as dropping both nouns and verbs, dropping only verbs, or keeping only nouns. Overall, this study provides valuable insights into probing and perturbation-specific strengths and weaknesses of popular multilingual Transformer-based models for different Indic languages. We make our code and dataset publicly available [https://tinyurl.com/IndicSentEval}].
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Energy-Efficient & Real-Time Computer Vision with Intelligent Skipping via Reconfigurable CMOS Image Sensors
Authors:
Md Abdullah-Al Kaiser,
Sreetama Sarkar,
Peter A. Beerel,
Akhilesh R. Jaiswal,
Gourav Datta
Abstract:
Current video-based computer vision (CV) applications typically suffer from high energy consumption due to reading and processing all pixels in a frame, regardless of their significance. While previous works have attempted to reduce this energy by skipping input patches or pixels and using feedback from the end task to guide the skipping algorithm, the skipping is not performed during the sensor r…
▽ More
Current video-based computer vision (CV) applications typically suffer from high energy consumption due to reading and processing all pixels in a frame, regardless of their significance. While previous works have attempted to reduce this energy by skipping input patches or pixels and using feedback from the end task to guide the skipping algorithm, the skipping is not performed during the sensor read phase. As a result, these methods can not optimize the front-end sensor energy. Moreover, they may not be suitable for real-time applications due to the long latency of modern CV networks that are deployed in the back-end. To address this challenge, this paper presents a custom-designed reconfigurable CMOS image sensor (CIS) system that improves energy efficiency by selectively skipping uneventful regions or rows within a frame during the sensor's readout phase, and the subsequent analog-to-digital conversion (ADC) phase. A novel masking algorithm intelligently directs the skipping process in real-time, optimizing both the front-end sensor and back-end neural networks for applications including autonomous driving and augmented/virtual reality (AR/VR). Our system can also operate in standard mode without skipping, depending on application needs. We evaluate our hardware-algorithm co-design framework on object detection based on BDD100K and ImageNetVID, and gaze estimation based on OpenEDS, achieving up to 53% reduction in front-end sensor energy while maintaining state-of-the-art (SOTA) accuracy.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
FPCA: Field-Programmable Pixel Convolutional Array for Extreme-Edge Intelligence
Authors:
Zihan Yin,
Akhilesh Jaiswal
Abstract:
The rapid advancement of neural network applications necessitates hardware that not only accelerates computation but also adapts efficiently to dynamic processing requirements. While processing-in-pixel has emerged as a promising solution to overcome the bottlenecks of traditional architectures at the extreme-edge, existing implementations face limitations in reconfigurability and scalability due…
▽ More
The rapid advancement of neural network applications necessitates hardware that not only accelerates computation but also adapts efficiently to dynamic processing requirements. While processing-in-pixel has emerged as a promising solution to overcome the bottlenecks of traditional architectures at the extreme-edge, existing implementations face limitations in reconfigurability and scalability due to their static nature and inefficient area usage. Addressing these challenges, we present a novel architecture that significantly enhances the capabilities of processing-in-pixel for convolutional neural networks. Our design innovatively integrates non-volatile memory (NVM) with novel unit pixel circuit design, enabling dynamic reconfiguration of synaptic weights, kernel size, channel size and stride size. Thus offering unprecedented flexibility and adaptability. With using a separate die for pixel circuit and storing synaptic weights, our circuit achieves a substantial reduction in the required area per pixel thereby increasing the density and scalability of the pixel array. Simulation results demonstrate dot product operations of the circuit, the non-linearity of its analog output and a novel bucket-select curvefit model is proposed to capture it. This work not only addresses the limitations of current in-pixel computing approaches but also opens new avenues for developing more efficient, flexible, and scalable neural network hardware, paving the way for advanced AI applications.
△ Less
Submitted 3 August, 2024;
originally announced August 2024.
-
Retina-Inspired Object Motion Segmentation for Event-Cameras
Authors:
Victoria Clerico,
Shay Snyder,
Arya Lohia,
Md Abdullah-Al Kaiser,
Gregory Schwartz,
Akhilesh Jaiswal,
Maryam Parsa
Abstract:
Event-cameras have emerged as a revolutionary technology with a high temporal resolution that far surpasses standard active pixel cameras. This technology draws biological inspiration from photoreceptors and the initial retinal synapse. This research showcases the potential of additional retinal functionalities to extract visual features. We provide a domain-agnostic and efficient algorithm for eg…
▽ More
Event-cameras have emerged as a revolutionary technology with a high temporal resolution that far surpasses standard active pixel cameras. This technology draws biological inspiration from photoreceptors and the initial retinal synapse. This research showcases the potential of additional retinal functionalities to extract visual features. We provide a domain-agnostic and efficient algorithm for ego-motion compensation based on Object Motion Sensitivity (OMS), one of the multiple features computed within the mammalian retina. We develop a method based on experimental neuroscience that translates OMS' biological circuitry to a low-overhead algorithm to suppress camera motion bypassing the need for deep networks and learning. Our system processes event data from dynamic scenes to perform pixel-wise object motion segmentation using a real and synthetic dataset. This paper introduces a bio-inspired computer vision method that dramatically reduces the number of parameters by $\text{10}^\text{3}$ to $\text{10}^\text{6}$ orders of magnitude compared to previous approaches. Our work paves the way for robust, high-speed, and low-bandwidth decision-making for in-sensor computations.
△ Less
Submitted 6 December, 2024; v1 submitted 18 August, 2024;
originally announced August 2024.
-
Hardware-Algorithm Re-engineering of Retinal Circuit for Intelligent Object Motion Segmentation
Authors:
Jason Sinaga,
Victoria Clerico,
Md Abdullah-Al Kaiser,
Shay Snyder,
Arya Lohia,
Gregory Schwartz,
Maryam Parsa,
Akhilesh Jaiswal
Abstract:
Recent advances in retinal neuroscience have fueled various hardware and algorithmic efforts to develop retina-inspired solutions for computer vision tasks. In this work, we focus on a fundamental visual feature within the mammalian retina, Object Motion Sensitivity (OMS). Using DVS data from EV-IMO dataset, we analyze the performance of an algorithmic implementation of OMS circuitry for motion se…
▽ More
Recent advances in retinal neuroscience have fueled various hardware and algorithmic efforts to develop retina-inspired solutions for computer vision tasks. In this work, we focus on a fundamental visual feature within the mammalian retina, Object Motion Sensitivity (OMS). Using DVS data from EV-IMO dataset, we analyze the performance of an algorithmic implementation of OMS circuitry for motion segmentation in presence of ego-motion. This holistic analysis considers the underlying constraints arising from the hardware circuit implementation. We present novel CMOS circuits that implement OMS functionality inside image sensors, while providing run-time re-configurability for key algorithmic parameters. In-sensor technologies for dynamical environment adaptation are crucial for ensuring high system performance. Finally, we verify the functionality and re-configurability of the proposed CMOS circuit designs through Cadence simulations in 180nm technology. In summary, the presented work lays foundation for hardware-algorithm re-engineering of known biological circuits to suit application needs.
△ Less
Submitted 6 December, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Estimation of the Area and Precipitation Associated with a Tropical Cyclone Biparjoy by using Image Processing
Authors:
Shikha Verma,
Kuldeep Srivastava,
Akhilesh Tiwari,
Shekhar Verma
Abstract:
The rainfall associated with Topical Cyclone(TC) contributes a major amount to the annual rainfall in India. Due to the limited research on the quantitative precipitation associated with Tropical Cyclones (TC), the prediction of the amount of precipitation and area that it may cover remains a challenge. This paper proposes an approach to estimate the accumulated precipitation and impact on affecte…
▽ More
The rainfall associated with Topical Cyclone(TC) contributes a major amount to the annual rainfall in India. Due to the limited research on the quantitative precipitation associated with Tropical Cyclones (TC), the prediction of the amount of precipitation and area that it may cover remains a challenge. This paper proposes an approach to estimate the accumulated precipitation and impact on affected area using Remote Sensing data. For this study, an instance of Extremely Severe Cyclonic Storm, Biparjoy that formed over the Arabian Sea and hit India in 2023 is considered in which we have used the satellite images of IMERG-Late Run of Global Precipitation Measurement (GPM). Image processing techniques were employed to identify and extract precipitation clusters linked to the cyclone. The results indicate that Biparjoy contributed a daily average rainfall of 53.14 mm/day across India and the Arabian Sea, with the Indian boundary receiving 11.59 mm/day, covering an extensive 411.76 thousand square kilometers. The localized intensity and variability observed in states like Gujarat, Rajasthan, Madhya Pradesh, and Uttar Pradesh highlight the need for tailored response measures, emphasizing the importance of further research to enhance predictive models and disaster readiness, crucial for building resilience against the diverse impacts of tropical cyclones.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Toward High Performance, Programmable Extreme-Edge Intelligence for Neuromorphic Vision Sensors utilizing Magnetic Domain Wall Motion-based MTJ
Authors:
Md Abdullah-Al Kaiser,
Gourav Datta,
Peter A. Beerel,
Akhilesh R. Jaiswal
Abstract:
The desire to empower resource-limited edge devices with computer vision (CV) must overcome the high energy consumption of collecting and processing vast sensory data. To address the challenge, this work proposes an energy-efficient non-von-Neumann in-pixel processing solution for neuromorphic vision sensors employing emerging (X) magnetic domain wall magnetic tunnel junction (MDWMTJ) for the firs…
▽ More
The desire to empower resource-limited edge devices with computer vision (CV) must overcome the high energy consumption of collecting and processing vast sensory data. To address the challenge, this work proposes an energy-efficient non-von-Neumann in-pixel processing solution for neuromorphic vision sensors employing emerging (X) magnetic domain wall magnetic tunnel junction (MDWMTJ) for the first time, in conjunction with CMOS-based neuromorphic pixels. Our hybrid CMOS+X approach performs in-situ massively parallel asynchronous analog convolution, exhibiting low power consumption and high accuracy across various CV applications by leveraging the non-volatility and programmability of the MDWMTJ. Moreover, our developed device-circuit-algorithm co-design framework captures device constraints (low tunnel-magnetoresistance, low dynamic range) and circuit constraints (non-linearity, process variation, area consideration) based on monte-carlo simulations and device parameters utilizing GF22nm FD-SOI technology. Our experimental results suggest we can achieve an average of 45.3% reduction in backend-processor energy, maintaining similar front-end energy compared to the state-of-the-art and high accuracy of 79.17% and 95.99% on the DVS-CIFAR10 and IBM DVS128-Gesture datasets, respectively.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
A Review on Digital Pixel Sensors
Authors:
Md Rahatul Islam Udoy,
Shamiul Alam,
Md Mazharul Islam,
Akhilesh Jaiswal,
Ahmedullah Aziz
Abstract:
Digital pixel sensor (DPS) has evolved as a pivotal component in modern imaging systems and has the potential to revolutionize various fields such as medical imaging, astronomy, surveillance, IoT devices, etc. Compared to analog pixel sensors, the DPS offers high speed and good image quality. However, the introduced intrinsic complexity within each pixel, primarily attributed to the accommodation…
▽ More
Digital pixel sensor (DPS) has evolved as a pivotal component in modern imaging systems and has the potential to revolutionize various fields such as medical imaging, astronomy, surveillance, IoT devices, etc. Compared to analog pixel sensors, the DPS offers high speed and good image quality. However, the introduced intrinsic complexity within each pixel, primarily attributed to the accommodation of the ADC circuit, engenders a substantial increase in the pixel pitch. Unfortunately, such a pronounced escalation in pixel pitch drastically undermines the feasibility of achieving high-density integration, which is an obstacle that significantly narrows down the field of potential applications. Nonetheless, designing compact conversion circuits along with strategic integration of 3D architectural paradigms can be a potential remedy to the prevailing situation. This review article presents a comprehensive overview of the vast area of DPS technology. The operating principles, advantages, and challenges of different types of DPS circuits have been analyzed. We categorize the schemes into several categories based on ADC operation. A comparative study based on different performance metrics has also been showcased for a well-rounded understanding.
△ Less
Submitted 28 November, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird's-Eye-View
Authors:
Arindam Das,
Sudarshan Paul,
Niko Scholz,
Akhilesh Kumar Malviya,
Ganesh Sistu,
Ujjwal Bhattacharya,
Ciarán Eising
Abstract:
Accurate obstacle identification represents a fundamental challenge within the scope of near-field perception for autonomous driving. Conventionally, fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization. However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when s…
▽ More
Accurate obstacle identification represents a fundamental challenge within the scope of near-field perception for autonomous driving. Conventionally, fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization. However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when subjected to intense sun glare. Conversely, cost-effective sensors like ultrasonic sensors remain largely unaffected under these conditions. Therefore, we present, to our knowledge, the first end-to-end multimodal fusion model tailored for efficient obstacle perception in a bird's-eye-view (BEV) perspective, utilizing fisheye cameras and ultrasonic sensors. Initially, ResNeXt-50 is employed as a set of unimodal encoders to extract features specific to each modality. Subsequently, the feature space associated with the visible spectrum undergoes transformation into BEV. The fusion of these two modalities is facilitated via concatenation. At the same time, the ultrasonic spectrum-based unimodal feature maps pass through content-aware dilated convolution, applied to mitigate the sensor misalignment between two sensors in the fused feature space. Finally, the fused features are utilized by a two-stage semantic occupancy decoder to generate grid-wise predictions for precise obstacle perception. We conduct a systematic investigation to determine the optimal strategy for multimodal fusion of both sensors. We provide insights into our dataset creation procedures, annotation guidelines, and perform a thorough data analysis to ensure adequate coverage of all scenarios. When applied to our dataset, the experimental results underscore the robustness and effectiveness of our proposed multimodal fusion approach.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
TEPI: Taxonomy-aware Embedding and Pseudo-Imaging for Scarcely-labeled Zero-shot Genome Classification
Authors:
Sathyanarayanan Aakur,
Vishalini R. Laguduva,
Priyadharsini Ramamurthy,
Akhilesh Ramachandran
Abstract:
A species' genetic code or genome encodes valuable evolutionary, biological, and phylogenetic information that aids in species recognition, taxonomic classification, and understanding genetic predispositions like drug resistance and virulence. However, the vast number of potential species poses significant challenges in developing a general-purpose whole genome classification tool. Traditional bio…
▽ More
A species' genetic code or genome encodes valuable evolutionary, biological, and phylogenetic information that aids in species recognition, taxonomic classification, and understanding genetic predispositions like drug resistance and virulence. However, the vast number of potential species poses significant challenges in developing a general-purpose whole genome classification tool. Traditional bioinformatics tools have made notable progress but lack scalability and are computationally expensive. Machine learning-based frameworks show promise but must address the issue of large classification vocabularies with long-tail distributions. In this study, we propose addressing this problem through zero-shot learning using TEPI, Taxonomy-aware Embedding and Pseudo-Imaging. We represent each genome as pseudo-images and map them to a taxonomy-aware embedding space for reasoning and classification. This embedding space captures compositional and phylogenetic relationships of species, enabling predictions in extensive search spaces. We evaluate TEPI using two rigorous zero-shot settings and demonstrate its generalization capabilities qualitatively on curated, large-scale, publicly sourced data.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Recent Advances in Scalable Energy-Efficient and Trustworthy Spiking Neural networks: from Algorithms to Technology
Authors:
Souvik Kundu,
Rui-Jie Zhu,
Akhilesh Jaiswal,
Peter A. Beerel
Abstract:
Neuromorphic computing and, in particular, spiking neural networks (SNNs) have become an attractive alternative to deep neural networks for a broad range of signal processing applications, processing static and/or temporal inputs from different sensory modalities, including audio and vision sensors. In this paper, we start with a description of recent advances in algorithmic and optimization innov…
▽ More
Neuromorphic computing and, in particular, spiking neural networks (SNNs) have become an attractive alternative to deep neural networks for a broad range of signal processing applications, processing static and/or temporal inputs from different sensory modalities, including audio and vision sensors. In this paper, we start with a description of recent advances in algorithmic and optimization innovations to efficiently train and scale low-latency, and energy-efficient spiking neural networks (SNNs) for complex machine learning applications. We then discuss the recent efforts in algorithm-architecture co-design that explores the inherent trade-offs between achieving high energy-efficiency and low latency while still providing high accuracy and trustworthiness. We then describe the underlying hardware that has been developed to leverage such algorithmic innovations in an efficient way. In particular, we describe a hybrid method to integrate significant portions of the model's computation within both memory components as well as the sensor itself. Finally, we discuss the potential path forward for research in building deployable SNN systems identifying key challenges in the algorithm-hardware-application co-design space with an emphasis on trustworthiness.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model
Authors:
Yu-Chen Lin,
Akhilesh Kumar,
Norman Chang,
Wenliang Zhang,
Muhammad Zakir,
Rucha Apte,
Haiyang He,
Chao Wang,
Jyh-Shing Roger Jang
Abstract:
We present four main contributions to enhance the performance of Large Language Models (LLMs) in generating domain-specific code: (i) utilizing LLM-based data splitting and data renovation techniques to improve the semantic representation of embeddings' space; (ii) introducing the Chain of Density for Renovation Credibility (CoDRC), driven by LLMs, and the Adaptive Text Renovation (ATR) algorithm…
▽ More
We present four main contributions to enhance the performance of Large Language Models (LLMs) in generating domain-specific code: (i) utilizing LLM-based data splitting and data renovation techniques to improve the semantic representation of embeddings' space; (ii) introducing the Chain of Density for Renovation Credibility (CoDRC), driven by LLMs, and the Adaptive Text Renovation (ATR) algorithm for assessing data renovation reliability; (iii) developing the Implicit Knowledge Expansion and Contemplation (IKEC) Prompt technique; and (iv) effectively refactoring existing scripts to generate new and high-quality scripts with LLMs. By using engineering simulation software RedHawk-SC as a case study, we demonstrate the effectiveness of our data pre-processing method for expanding and categorizing scripts. When combined with IKEC, these techniques enhance the Retrieval-Augmented Generation (RAG) method in retrieving more relevant information, ultimately achieving a 73.33% "Percentage of Correct Lines" for code generation problems in MapReduce applications.
△ Less
Submitted 30 January, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Hardware-Algorithm Co-design Enabling Processing-in-Pixel-in-Memory (P2M) for Neuromorphic Vision Sensors
Authors:
Md Abdullah-Al Kaiser,
Akhilesh R. Jaiswal
Abstract:
The high volume of data transmission between the edge sensor and the cloud processor leads to energy and throughput bottlenecks for resource-constrained edge devices focused on computer vision. Hence, researchers are investigating different approaches (e.g., near-sensor processing, in-sensor processing, in-pixel processing) by executing computations closer to the sensor to reduce the transmission…
▽ More
The high volume of data transmission between the edge sensor and the cloud processor leads to energy and throughput bottlenecks for resource-constrained edge devices focused on computer vision. Hence, researchers are investigating different approaches (e.g., near-sensor processing, in-sensor processing, in-pixel processing) by executing computations closer to the sensor to reduce the transmission bandwidth. Specifically, in-pixel processing for neuromorphic vision sensors (e.g., dynamic vision sensors (DVS)) involves incorporating asynchronous multiply-accumulate (MAC) operations within the pixel array, resulting in improved energy efficiency. In a CMOS implementation, low overhead energy-efficient analog MAC accumulates charges on a passive capacitor; however, the capacitor's limited charge retention time affects the algorithmic integration time choices, impacting the algorithmic accuracy, bandwidth, energy, and training efficiency. Consequently, this results in a design trade-off on the hardware aspect-creating a need for a low-leakage compute unit while maintaining the area and energy benefits. In this work, we present a holistic analysis of the hardware-algorithm co-design trade-off based on the limited integration time posed by the hardware and techniques to improve the leakage performance of the in-pixel analog MAC operations.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras
Authors:
Nithya R,
Malavika S,
Jordan F,
Arjun Gangwar,
Metilda N J,
S Umesh,
Rithik Sarab,
Akhilesh Kumar Dubey,
Govind Divakaran,
Samudra Vijaya K,
Suryakanth V Gangashetty
Abstract:
India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sour…
▽ More
India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. Building speech based applications for the Indian population is a difficult problem owing to limited data and the number of languages and accents to accommodate. To encourage the language technology community to build speech based applications in Indian languages, we are open sourcing SPRING-INX data which has about 2000 hours of legally sourced and manually transcribed speech data for ASR system building in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi and Tamil. This endeavor is by SPRING Lab , Indian Institute of Technology Madras and is a part of National Language Translation Mission (NLTM), funded by the Indian Ministry of Electronics and Information Technology (MeitY), Government of India. We describe the data collection and data cleaning process along with the data statistics in this paper.
△ Less
Submitted 24 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
An Interactive Web-Based System for Creating Single Panel Cartoons with Visually Valid Compositions
Authors:
Ergun Akleman,
Akhilesh Vijaykumar,
Richard Furuta,
Derya Akleman
Abstract:
The creation of cartoon-based stories (comics) requires a lot of creativity and hard work for naive users. We observe that single-panel cartoons are the building blocks of any comic story. To develop a strong comic story, it is critical to obtain visually valid single panels. In this work, we have developed a methodology to guarantee the placement of characters to obtain a valid cartoon frame base…
▽ More
The creation of cartoon-based stories (comics) requires a lot of creativity and hard work for naive users. We observe that single-panel cartoons are the building blocks of any comic story. To develop a strong comic story, it is critical to obtain visually valid single panels. In this work, we have developed a methodology to guarantee the placement of characters to obtain a valid cartoon frame based on the methods used by professional cartoonists. Using this methodology, we have developed a web-based system to create single-panel cartoons from a given set of character images. We have made this system available in GitHub as open-source so that this basic single-panel cartoon can be used as infrastructure to develop more complex structures. Our web-based system for single-panel cartoons can be viewed at http://storytelling.viz.tamu.edu.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
A 9 Transistor SRAM Featuring Array-level XOR Parallelism with Secure Data Toggling Operation
Authors:
Zihan Yin,
Annewsha Datta,
Shwetha Vijayakumar,
Ajey Jacob,
Akhilesh Jaiswal
Abstract:
Security and energy-efficiency are critical for computing applications in general and for edge applications in particular. Digital in-Memory Computing (IMC) in SRAM cells have widely been studied to accelerate inference tasks to maximize both throughput and energy efficiency for intelligent computing at the edge. XOR operations have been of particular interest due to their wide applicability in nu…
▽ More
Security and energy-efficiency are critical for computing applications in general and for edge applications in particular. Digital in-Memory Computing (IMC) in SRAM cells have widely been studied to accelerate inference tasks to maximize both throughput and energy efficiency for intelligent computing at the edge. XOR operations have been of particular interest due to their wide applicability in numerous applications that include binary neural networks and encryption. However, existing IMC circuits for XOR acceleration are limited to two rows in a memory array and extending the XOR parallelism to multiple rows in an SRAM array has remained elusive. Further, SRAM is prone to both data imprinting and data remanence security issues, which poses limitations on security . Based on commerical Globalfoundries 22nm mode, we are proposing a novel 9T SRAM cell such that multiple rows of data (entire array) can be XORed in a massively parallel single cycle fashion. The new cell also supports data-toggling within the SRAM cell efficiently to circumvent imprinting attacks and erase the SRAM value in case of remanence attack.
△ Less
Submitted 11 August, 2023;
originally announced September 2023.
-
Deep Learning Techniques in Extreme Weather Events: A Review
Authors:
Shikha Verma,
Kuldeep Srivastava,
Akhilesh Tiwari,
Shekhar Verma
Abstract:
Extreme weather events pose significant challenges, thereby demanding techniques for accurate analysis and precise forecasting to mitigate its impact. In recent years, deep learning techniques have emerged as a promising approach for weather forecasting and understanding the dynamics of extreme weather events. This review aims to provide a comprehensive overview of the state-of-the-art deep learni…
▽ More
Extreme weather events pose significant challenges, thereby demanding techniques for accurate analysis and precise forecasting to mitigate its impact. In recent years, deep learning techniques have emerged as a promising approach for weather forecasting and understanding the dynamics of extreme weather events. This review aims to provide a comprehensive overview of the state-of-the-art deep learning in the field. We explore the utilization of deep learning architectures, across various aspects of weather prediction such as thunderstorm, lightning, precipitation, drought, heatwave, cold waves and tropical cyclones. We highlight the potential of deep learning, such as its ability to capture complex patterns and non-linear relationships. Additionally, we discuss the limitations of current approaches and highlight future directions for advancements in the field of meteorology. The insights gained from this systematic review are crucial for the scientific community to make informed decisions and mitigate the impacts of extreme weather events.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
A Reinforcement Learning Approach for Performance-aware Reduction in Power Consumption of Data Center Compute Nodes
Authors:
Akhilesh Raj,
Swann Perarnau,
Aniruddha Gokhale
Abstract:
As Exascale computing becomes a reality, the energy needs of compute nodes in cloud data centers will continue to grow. A common approach to reducing this energy demand is to limit the power consumption of hardware components when workloads are experiencing bottlenecks elsewhere in the system. However, designing a resource controller capable of detecting and limiting power consumption on-the-fly i…
▽ More
As Exascale computing becomes a reality, the energy needs of compute nodes in cloud data centers will continue to grow. A common approach to reducing this energy demand is to limit the power consumption of hardware components when workloads are experiencing bottlenecks elsewhere in the system. However, designing a resource controller capable of detecting and limiting power consumption on-the-fly is a complex issue and can also adversely impact application performance. In this paper, we explore the use of Reinforcement Learning (RL) to design a power capping policy on cloud compute nodes using observations on current power consumption and instantaneous application performance (heartbeats). By leveraging the Argo Node Resource Management (NRM) software stack in conjunction with the Intel Running Average Power Limit (RAPL) hardware control mechanism, we design an agent to control the maximum supplied power to processors without compromising on application performance. Employing a Proximal Policy Optimization (PPO) agent to learn an optimal policy on a mathematical model of the compute nodes, we demonstrate and evaluate using the STREAM benchmark how a trained agent running on actual hardware can take actions by balancing power consumption and application performance.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
ChatGPT-based Investment Portfolio Selection
Authors:
Oleksandr Romanko,
Akhilesh Narayan,
Roy H. Kwon
Abstract:
In this paper, we explore potential uses of generative AI models, such as ChatGPT, for investment portfolio selection. Trusting investment advice from Generative Pre-Trained Transformer (GPT) models is a challenge due to model "hallucinations", necessitating careful verification and validation of the output. Therefore, we take an alternative approach. We use ChatGPT to obtain a universe of stocks…
▽ More
In this paper, we explore potential uses of generative AI models, such as ChatGPT, for investment portfolio selection. Trusting investment advice from Generative Pre-Trained Transformer (GPT) models is a challenge due to model "hallucinations", necessitating careful verification and validation of the output. Therefore, we take an alternative approach. We use ChatGPT to obtain a universe of stocks from S&P500 market index that are potentially attractive for investing. Subsequently, we compared various portfolio optimization strategies that utilized this AI-generated trading universe, evaluating those against quantitative portfolio optimization models as well as comparing to some of the popular investment funds. Our findings indicate that ChatGPT is effective in stock selection but may not perform as well in assigning optimal weights to stocks within the portfolio. But when stocks selection by ChatGPT is combined with established portfolio optimization models, we achieve even better results. By blending strengths of AI-generated stock selection with advanced quantitative optimization techniques, we observed the potential for more robust and favorable investment outcomes, suggesting a hybrid approach for more effective and reliable investment decision-making in the future.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Web crawler strategies for web pages under robot.txt restriction
Authors:
Piyush Vyas,
Akhilesh Chauhan,
Tushar Mandge,
Surbhi Hardikar
Abstract:
In the present time, all know about World Wide Web and work over the Internet daily. In this paper, we introduce the search engines working for keywords that are entered by users to find something. The search engine uses different search algorithms for convenient results for providing to the net surfer. Net surfers go with the top search results but how did the results of web pages get higher rank…
▽ More
In the present time, all know about World Wide Web and work over the Internet daily. In this paper, we introduce the search engines working for keywords that are entered by users to find something. The search engine uses different search algorithms for convenient results for providing to the net surfer. Net surfers go with the top search results but how did the results of web pages get higher ranks over search engines? how the search engine got that all the web pages in the database? This paper gives the answers to all these kinds of basic questions. Web crawlers working for search engines and robot exclusion protocol rules for web crawlers are also addressed in this research paper. Webmaster uses different restriction facts in robot.txt file to instruct web crawler, some basic formats of robot.txt are also mentioned in this paper.
△ Less
Submitted 28 February, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Authors:
Nghi D. Q. Bui,
Hung Le,
Yue Wang,
Junnan Li,
Akhilesh Deepak Gotmare,
Steven C. H. Hoi
Abstract:
Code intelligence plays a key role in transforming modern software engineering. Recently, deep learning-based models, especially Transformer-based large language models (LLMs), have demonstrated remarkable potential in tackling these tasks by leveraging massive open-source code data and programming language features. However, the development and deployment of such models often require expertise in…
▽ More
Code intelligence plays a key role in transforming modern software engineering. Recently, deep learning-based models, especially Transformer-based large language models (LLMs), have demonstrated remarkable potential in tackling these tasks by leveraging massive open-source code data and programming language features. However, the development and deployment of such models often require expertise in both machine learning and software engineering, creating a barrier for the model adoption. In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. Following the principles of modular design and extensible framework, we design CodeTF with a unified interface to enable rapid access and development across different types of models, datasets and tasks. Our library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes. In this paper, we describe the design principles, the architecture, key modules and components, and compare with other related library tools. Finally, we hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering, providing a comprehensive open-source solution for developers, researchers, and practitioners.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Authors:
Yue Wang,
Hung Le,
Akhilesh Deepak Gotmare,
Nghi D. Q. Bui,
Junnan Li,
Steven C. H. Hoi
Abstract:
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limi…
▽ More
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some downstream tasks and hence result in substantial performance degrade. To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, our instruction-tuned CodeT5+ 16B achieves new SoTA results on HumanEval code generation task against other open code LLMs.
△ Less
Submitted 20 May, 2023; v1 submitted 13 May, 2023;
originally announced May 2023.
-
TREBUCHET: Fully Homomorphic Encryption Accelerator for Deep Computation
Authors:
David Bruce Cousins,
Yuriy Polyakov,
Ahmad Al Badawi,
Matthew French,
Andrew Schmidt,
Ajey Jacob,
Benedict Reynwar,
Kellie Canida,
Akhilesh Jaiswal,
Clynn Mathew,
Homer Gamil,
Negar Neda,
Deepraj Soni,
Michail Maniatakos,
Brandon Reagen,
Naifeng Zhang,
Franz Franchetti,
Patrick Brinich,
Jeremy Johnson,
Patrick Broderick,
Mike Franusich,
Bo Zhang,
Zeming Cheng,
Massoud Pedram
Abstract:
Secure computation is of critical importance to not only the DoD, but across financial institutions, healthcare, and anywhere personally identifiable information (PII) is accessed. Traditional security techniques require data to be decrypted before performing any computation. When processed on untrusted systems the decrypted data is vulnerable to attacks to extract the sensitive information. To ad…
▽ More
Secure computation is of critical importance to not only the DoD, but across financial institutions, healthcare, and anywhere personally identifiable information (PII) is accessed. Traditional security techniques require data to be decrypted before performing any computation. When processed on untrusted systems the decrypted data is vulnerable to attacks to extract the sensitive information. To address these vulnerabilities Fully Homomorphic Encryption (FHE) keeps the data encrypted during computation and secures the results, even in these untrusted environments. However, FHE requires a significant amount of computation to perform equivalent unencrypted operations. To be useful, FHE must significantly close the computation gap (within 10x) to make encrypted processing practical. To accomplish this ambitious goal the TREBUCHET project is leading research and development in FHE processing hardware to accelerate deep computations on encrypted data, as part of the DARPA MTO Data Privacy for Virtual Environments (DPRIVE) program. We accelerate the major secure standardized FHE schemes (BGV, BFV, CKKS, FHEW, etc.) at >=128-bit security while integrating with the open-source PALISADE and OpenFHE libraries currently used in the DoD and in industry. We utilize a novel tile-based chip design with highly parallel ALUs optimized for vectorized 128b modulo arithmetic. The TREBUCHET coprocessor design provides a highly modular, flexible, and extensible FHE accelerator for easy reconfiguration, deployment, integration and application on other hardware form factors, such as System-on-Chip or alternate chip areas.
△ Less
Submitted 18 April, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Technology-Circuit-Algorithm Tri-Design for Processing-in-Pixel-in-Memory (P2M)
Authors:
Md Abdullah-Al Kaiser,
Gourav Datta,
Sreetama Sarkar,
Souvik Kundu,
Zihan Yin,
Manas Garg,
Ajey P. Jacob,
Peter A. Beerel,
Akhilesh R. Jaiswal
Abstract:
The massive amounts of data generated by camera sensors motivate data processing inside pixel arrays, i.e., at the extreme-edge. Several critical developments have fueled recent interest in the processing-in-pixel-in-memory paradigm for a wide range of visual machine intelligence tasks, including (1) advances in 3D integration technology to enable complex processing inside each pixel in a 3D integ…
▽ More
The massive amounts of data generated by camera sensors motivate data processing inside pixel arrays, i.e., at the extreme-edge. Several critical developments have fueled recent interest in the processing-in-pixel-in-memory paradigm for a wide range of visual machine intelligence tasks, including (1) advances in 3D integration technology to enable complex processing inside each pixel in a 3D integrated manner while maintaining pixel density, (2) analog processing circuit techniques for massively parallel low-energy in-pixel computations, and (3) algorithmic techniques to mitigate non-idealities associated with analog processing through hardware-aware training schemes. This article presents a comprehensive technology-circuit-algorithm landscape that connects technology capabilities, circuit design strategies, and algorithmic optimizations to power, performance, area, bandwidth reduction, and application-level accuracy metrics. We present our results using a comprehensive co-design framework incorporating hardware and algorithmic optimizations for various complex real-life visual intelligence tasks mapped onto our P2M paradigm.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
A Context-Switching/Dual-Context ROM Augmented RAM using Standard 8T SRAM
Authors:
Md Abdullah-Al Kaiser,
Edwin Tieu,
Ajey P. Jacob,
Akhilesh R. Jaiswal
Abstract:
The landscape of emerging applications has been continually widening, encompassing various data-intensive applications like artificial intelligence, machine learning, secure encryption, Internet-of-Things, etc. A sustainable approach toward creating dedicated hardware platforms that can cater to multiple applications often requires the underlying hardware to context-switch or support more than one…
▽ More
The landscape of emerging applications has been continually widening, encompassing various data-intensive applications like artificial intelligence, machine learning, secure encryption, Internet-of-Things, etc. A sustainable approach toward creating dedicated hardware platforms that can cater to multiple applications often requires the underlying hardware to context-switch or support more than one context simultaneously. This paper presents a context-switching and dual-context memory based on the standard 8T SRAM bit-cell. Specifically, we exploit the availability of multi-VT transistors by selectively choosing the read-port transistors of the 8T SRAM cell to be either high-VT or low-VT. The 8T SRAM cell is thus augmented to store ROM data (represented as the VT of the transistors constituting the read-port) while simultaneously storing RAM data. Further, we propose specific sensing methodologies such that the memory array can support RAM-only or ROM-only mode (context-switching (CS) mode) or RAM and ROM mode simultaneously (dual-context (DC) mode). Extensive Monte-Carlo simulations have verified the robustness of our proposed ROM-augmented CS/DC memory on the Globalfoundries 22nm-FDX technology node.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Object Motion Sensitivity: A Bio-inspired Solution to the Ego-motion Problem for Event-based Cameras
Authors:
Shay Snyder,
Hunter Thompson,
Md Abdullah-Al Kaiser,
Gregory Schwartz,
Akhilesh Jaiswal,
Maryam Parsa
Abstract:
Neuromorphic (event-based) image sensors draw inspiration from the human-retina to create an electronic device that can process visual stimuli in a way that closely resembles its biological counterpart. These sensors process information significantly different than the traditional RGB sensors. Specifically, the sensory information generated by event-based image sensors are orders of magnitude spar…
▽ More
Neuromorphic (event-based) image sensors draw inspiration from the human-retina to create an electronic device that can process visual stimuli in a way that closely resembles its biological counterpart. These sensors process information significantly different than the traditional RGB sensors. Specifically, the sensory information generated by event-based image sensors are orders of magnitude sparser compared to that of RGB sensors. The first generation of neuromorphic image sensors, Dynamic Vision Sensor (DVS), are inspired by the computations confined to the photoreceptors and the first retinal synapse. In this work, we highlight the capability of the second generation of neuromorphic image sensors, Integrated Retinal Functionality in CMOS Image Sensors (IRIS), which aims to mimic full retinal computations from photoreceptors to output of the retina (retinal ganglion cells) for targeted feature-extraction. The feature of choice in this work is Object Motion Sensitivity (OMS) that is processed locally in the IRIS sensor. Our results show that OMS can accomplish standard computer vision tasks with similar efficiency to conventional RGB and DVS solutions but offers drastic bandwidth reduction. This cuts the wireless and computing power budgets and opens up vast opportunities in high-speed, robust, energy-efficient, and low-bandwidth real-time decision making.
△ Less
Submitted 14 April, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
In-Sensor & Neuromorphic Computing are all you need for Energy Efficient Computer Vision
Authors:
Gourav Datta,
Zeyu Liu,
Md Abdullah-Al Kaiser,
Souvik Kundu,
Joe Mathai,
Zihan Yin,
Ajey P. Jacob,
Akhilesh R. Jaiswal,
Peter A. Beerel
Abstract:
Due to the high activation sparsity and use of accumulates (AC) instead of expensive multiply-and-accumulates (MAC), neuromorphic spiking neural networks (SNNs) have emerged as a promising low-power alternative to traditional DNNs for several computer vision (CV) applications. However, most existing SNNs require multiple time steps for acceptable inference accuracy, hindering real-time deployment…
▽ More
Due to the high activation sparsity and use of accumulates (AC) instead of expensive multiply-and-accumulates (MAC), neuromorphic spiking neural networks (SNNs) have emerged as a promising low-power alternative to traditional DNNs for several computer vision (CV) applications. However, most existing SNNs require multiple time steps for acceptable inference accuracy, hindering real-time deployment and increasing spiking activity and, consequently, energy consumption. Recent works proposed direct encoding that directly feeds the analog pixel values in the first layer of the SNN in order to significantly reduce the number of time steps. Although the overhead for the first layer MACs with direct encoding is negligible for deep SNNs and the CV processing is efficient using SNNs, the data transfer between the image sensors and the downstream processing costs significant bandwidth and may dominate the total energy. To mitigate this concern, we propose an in-sensor computing hardware-software co-design framework for SNNs targeting image recognition tasks. Our approach reduces the bandwidth between sensing and processing by 12-96x and the resulting total energy by 2.32x compared to traditional CV processing, with a 3.8% reduction in accuracy on ImageNet.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Scalable Pathogen Detection from Next Generation DNA Sequencing with Deep Learning
Authors:
Sai Narayanan,
Sathyanarayanan N. Aakur,
Priyadharsini Ramamurthy,
Arunkumar Bagavathi,
Vishalini Ramnath,
Akhilesh Ramachandran
Abstract:
Next-generation sequencing technologies have enhanced the scope of Internet-of-Things (IoT) to include genomics for personalized medicine through the increased availability of an abundance of genome data collected from heterogeneous sources at a reduced cost. Given the sheer magnitude of the collected data and the significant challenges offered by the presence of highly similar genomic structure a…
▽ More
Next-generation sequencing technologies have enhanced the scope of Internet-of-Things (IoT) to include genomics for personalized medicine through the increased availability of an abundance of genome data collected from heterogeneous sources at a reduced cost. Given the sheer magnitude of the collected data and the significant challenges offered by the presence of highly similar genomic structure across species, there is a need for robust, scalable analysis platforms to extract actionable knowledge such as the presence of potentially zoonotic pathogens. The emergence of zoonotic diseases from novel pathogens, such as the influenza virus in 1918 and SARS-CoV-2 in 2019 that can jump species barriers and lead to pandemic underscores the need for scalable metagenome analysis. In this work, we propose MG2Vec, a deep learning-based solution that uses the transformer network as its backbone, to learn robust features from raw metagenome sequences for downstream biomedical tasks such as targeted and generalized pathogen detection. Extensive experiments on four increasingly challenging, yet realistic diagnostic settings, show that the proposed approach can help detect pathogens from uncurated, real-world clinical samples with minimal human supervision in the form of labels. Further, we demonstrate that the learned representations can generalize to completely unrelated pathogens across diseases and species for large-scale metagenome analysis. We provide a comprehensive evaluation of a novel representation learning framework for metagenome-based disease diagnostics with deep learning and provide a way forward for extracting and using robust vector representations from low-cost next generation sequencing to develop generalizable diagnostic tools.
△ Less
Submitted 29 November, 2022;
originally announced December 2022.
-
OGInfra: Geolocating Oil & Gas Infrastructure using Remote Sensing based Active Fire Data
Authors:
Samyak Prajapati,
Amrit Raj,
Yash Chaudhari,
Akhilesh Nandwal,
Japman Singh Monga
Abstract:
Remote sensing has become a crucial part of our daily lives, whether it be from triangulating our location using GPS or providing us with a weather forecast. It has multiple applications in domains such as military, socio-economical, commercial, and even in supporting humanitarian efforts. This work proposes a novel technique for the automated geo-location of Oil & Gas infrastructure with the use…
▽ More
Remote sensing has become a crucial part of our daily lives, whether it be from triangulating our location using GPS or providing us with a weather forecast. It has multiple applications in domains such as military, socio-economical, commercial, and even in supporting humanitarian efforts. This work proposes a novel technique for the automated geo-location of Oil & Gas infrastructure with the use of Active Fire Data from the NASA FIRMS data repository & Deep Learning techniques; achieving a top accuracy of 90.68% with the use of ResNet101.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
Enabling ISP-less Low-Power Computer Vision
Authors:
Gourav Datta,
Zeyu Liu,
Zihan Yin,
Linyu Sun,
Akhilesh R. Jaiswal,
Peter A. Beerel
Abstract:
In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural…
▽ More
In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural network (CNN) layers. However, direct inference on the raw images degrades the test accuracy due to the difference in covariance of the raw images captured by the image sensors compared to the ISP-processed images used for training. Moreover, it is difficult to train deep CV models on raw images, because most (if not all) large-scale open-source datasets consist of RGB images. To mitigate this concern, we propose to invert the ISP pipeline, which can convert the RGB images of any dataset to its raw counterparts, and enable model training on raw images. We release the raw version of the COCO dataset, a large-scale benchmark for generic high-level vision tasks. For ISP-less CV systems, training on these raw images result in a 7.1% increase in test accuracy on the visual wake works (VWW) dataset compared to relying on training with traditional ISP-processed RGB datasets. To further improve the accuracy of ISP-less CV models and to increase the energy and bandwidth benefits obtained by in-sensor/in-pixel computing, we propose an energy-efficient form of analog in-pixel demosaicing that may be coupled with in-pixel CNN computations. When evaluated on raw images captured by real sensors from the PASCALRAW dataset, our approach results in a 8.1% increase in mAP. Lastly, we demonstrate a further 20.5% increase in mAP by using a novel application of few-shot learning with thirty shots each for the novel PASCALRAW dataset, constituting 3 classes.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
A Thermal Machine Learning Solver For Chip Simulation
Authors:
Rishikesh Ranade,
Haiyang He,
Jay Pathak,
Norman Chang,
Akhilesh Kumar,
Jimin Wen
Abstract:
Thermal analysis provides deeper insights into electronic chips behavior under different temperature scenarios and enables faster design exploration. However, obtaining detailed and accurate thermal profile on chip is very time-consuming using FEM or CFD. Therefore, there is an urgent need for speeding up the on-chip thermal solution to address various system scenarios. In this paper, we propose a…
▽ More
Thermal analysis provides deeper insights into electronic chips behavior under different temperature scenarios and enables faster design exploration. However, obtaining detailed and accurate thermal profile on chip is very time-consuming using FEM or CFD. Therefore, there is an urgent need for speeding up the on-chip thermal solution to address various system scenarios. In this paper, we propose a thermal machine-learning (ML) solver to speed-up thermal simulations of chips. The thermal ML-Solver is an extension of the recent novel approach, CoAEMLSim (Composable Autoencoder Machine Learning Simulator) with modifications to the solution algorithm to handle constant and distributed HTC. The proposed method is validated against commercial solvers, such as Ansys MAPDL, as well as a latest ML baseline, UNet, under different scenarios to demonstrate its enhanced accuracy, scalability, and generalizability.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
Performance Modeling Sparse MTTKRP Using Optical Static Random Access Memory on FPGA
Authors:
Sasindu Wijeratne,
Akhilesh Jaiswal,
Ajey P. Jacob,
Bingyi Zhang,
Viktor Prasanna
Abstract:
Electrical static random memory (E-SRAM) is the current standard for internal static memory in Field Programmable Gate Array (FPGA). Despite the dramatic improvement in E-SRAM technology over the past decade, the goal of ultra-fast, energy-efficient static random memory has yet to be achieved with E-SRAM technology. However, preliminary research into optical static random access memory (O-SRAM) ha…
▽ More
Electrical static random memory (E-SRAM) is the current standard for internal static memory in Field Programmable Gate Array (FPGA). Despite the dramatic improvement in E-SRAM technology over the past decade, the goal of ultra-fast, energy-efficient static random memory has yet to be achieved with E-SRAM technology. However, preliminary research into optical static random access memory (O-SRAM) has shown promising results in creating energy-efficient ultra-fast static memories.
This paper investigates the advantage of O-SRAM over E-SRAM in access speed and energy performance while executing sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP). spMTTKRP is an essential component of tensor decomposition algorithms which is heavily used in data science applications. The evaluation results show O-SRAMs can achieve speeds of 1.1x - 2.9x while saving 2.8x - 8.1x energy compared to conventional E-SRAM technology.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Authors:
Hung Le,
Yue Wang,
Akhilesh Deepak Gotmare,
Silvio Savarese,
Steven C. H. Hoi
Abstract:
Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language proble…
▽ More
Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language problem descriptions and ground-truth programs. Such paradigm largely ignores some important but potentially useful signals in the problem specification such as unit tests, which thus often results in poor performance when solving complex unseen coding tasks. To address the limitations, we propose "CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, we treat the code-generating LM as an actor network, and introduce a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor. During inference, we introduce a new generation procedure with a critical sampling strategy that allows a model to automatically regenerate programs based on feedback from example unit tests and critic scores. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data. Our method not only achieves new SOTA results on the challenging APPS benchmark, but also shows strong zero-shot transfer capability with new SOTA results on the simpler MBPP benchmark.
△ Less
Submitted 3 November, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Vehicle Route Planning using Dynamically Weighted Dijkstra's Algorithm with Traffic Prediction
Authors:
Piyush Udhan,
Akhilesh Ganeshkar,
Poobigan Murugesan,
Abhishek Raj Permani,
Sameep Sanjeeva,
Parth Deshpande
Abstract:
Traditional vehicle routing algorithms do not consider the changing nature of traffic. While implementations of Dijkstra's algorithm with varying weights exist, the weights are often changed after the outcome of algorithm is executed, which may not always result in the optimal route being chosen. Hence, this paper proposes a novel vehicle routing algorithm that improves upon Dijkstra's algorithm u…
▽ More
Traditional vehicle routing algorithms do not consider the changing nature of traffic. While implementations of Dijkstra's algorithm with varying weights exist, the weights are often changed after the outcome of algorithm is executed, which may not always result in the optimal route being chosen. Hence, this paper proposes a novel vehicle routing algorithm that improves upon Dijkstra's algorithm using a traffic prediction model based on the traffic flow in a road network. Here, Dijkstra's algorithm is adapted to be dynamic and time dependent using traffic flow theory principles during the planning stage itself. The model provides predicted traffic parameters and travel time across each edge of the road network at every time instant, leading to better routing results. The dynamic algorithm proposed here predicts changes in traffic conditions at each time step of planning to give the optimal forward-looking path. The proposed algorithm is verified by comparing it with conventional Dijkstra's algorithm on a graph with randomly simulated traffic, and is shown to predict the optimal route better with continuously changing traffic.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Efficient Reinforcement Learning for Unsupervised Controlled Text Generation
Authors:
Bhargav Upadhyay,
Akhilesh Sudhakar,
Arjun Maheswaran
Abstract:
Controlled text generation tasks such as unsupervised text style transfer have increasingly adopted the use of Reinforcement Learning (RL). A major challenge in applying RL to such tasks is the sparse reward, which is available only after the full text is generated. Sparse rewards, combined with a large action space make RL training sample-inefficient and difficult to converge. Recently proposed r…
▽ More
Controlled text generation tasks such as unsupervised text style transfer have increasingly adopted the use of Reinforcement Learning (RL). A major challenge in applying RL to such tasks is the sparse reward, which is available only after the full text is generated. Sparse rewards, combined with a large action space make RL training sample-inefficient and difficult to converge. Recently proposed reward-shaping strategies to address this issue have shown only negligible gains. In contrast, this work proposes a novel approach that provides dense rewards to each generated token. We evaluate our approach by its usage in unsupervised text style transfer. Averaged across datasets, our style transfer system improves upon current state-of-art systems by 21\% on human evaluation and 12\% on automatic evaluation. Upon ablated comparison with the current reward shaping approach (the `roll-out strategy'), using dense rewards improves the overall style transfer quality by 22\% based on human evaluation. Further the RL training is 2.5 times as sample efficient, and 7 times faster.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Toward Efficient Hyperspectral Image Processing inside Camera Pixels
Authors:
Gourav Datta,
Zihan Yin,
Ajey Jacob,
Akhilesh R. Jaiswal,
Peter A. Beerel
Abstract:
Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands as opposed to only three channels (red, green, and blue) in traditional cameras. This requires a significant amount of data transmission between the hyperspectral image sensor and a processor used to classify/detect/track the images, frame by frame, expending high energy and causing bandwidth an…
▽ More
Hyperspectral cameras generate a large amount of data due to the presence of hundreds of spectral bands as opposed to only three channels (red, green, and blue) in traditional cameras. This requires a significant amount of data transmission between the hyperspectral image sensor and a processor used to classify/detect/track the images, frame by frame, expending high energy and causing bandwidth and security bottlenecks. To mitigate this problem, we propose a form of processing-in-pixel (PIP) that leverages advanced CMOS technologies to enable the pixel array to perform a wide range of complex operations required by the modern convolutional neural networks (CNN) for hyperspectral image recognition (HSI). Consequently, our PIP-optimized custom CNN layers effectively compress the input data, significantly reducing the bandwidth required to transmit the data downstream to the HSI processing unit. This reduces the average energy consumption associated with pixel array of cameras and the CNN processing unit by 25.06x and 3.90x respectively, compared to existing hardware implementations. Our custom models yield average test accuracies within 0.56% of the baseline models for the standard HSI benchmarks.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications
Authors:
Gourav Datta,
Souvik Kundu,
Zihan Yin,
Ravi Teja Lakkireddy,
Joe Mathai,
Ajey Jacob,
Peter A. Beerel,
Akhilesh R. Jaiswal
Abstract:
The demand to process vast amounts of data generated from state-of-the-art high resolution cameras has motivated novel energy-efficient on-device AI solutions. Visual data in such cameras are usually captured in the form of analog voltages by a sensor pixel array, and then converted to the digital domain for subsequent AI processing using analog-to-digital converters (ADC). Recent research has tri…
▽ More
The demand to process vast amounts of data generated from state-of-the-art high resolution cameras has motivated novel energy-efficient on-device AI solutions. Visual data in such cameras are usually captured in the form of analog voltages by a sensor pixel array, and then converted to the digital domain for subsequent AI processing using analog-to-digital converters (ADC). Recent research has tried to take advantage of massively parallel low-power analog/digital computing in the form of near- and in-sensor processing, in which the AI computation is performed partly in the periphery of the pixel array and partly in a separate on-board CPU/accelerator. Unfortunately, high-resolution input images still need to be streamed between the camera and the AI processing unit, frame by frame, causing energy, bandwidth, and security bottlenecks. To mitigate this problem, we propose a novel Processing-in-Pixel-in-memory (P2M) paradigm, that customizes the pixel array by adding support for analog multi-channel, multi-bit convolution, batch normalization, and ReLU (Rectified Linear Units). Our solution includes a holistic algorithm-circuit co-design approach and the resulting P2M paradigm can be used as a drop-in replacement for embedding memory-intensive first few layers of convolutional neural network (CNN) models within foundry-manufacturable CMOS image sensor platforms. Our experimental results indicate that P2M reduces data transfer bandwidth from sensors and analog to digital conversions by ~21x, and the energy-delay product (EDP) incurred in processing a MobileNetV2 model on a TinyML use case for visual wake words dataset (VWW) by up to ~11x compared to standard near-processing or in-sensor implementations, without any significant drop in test accuracy.
△ Less
Submitted 16 March, 2022; v1 submitted 6 March, 2022;
originally announced March 2022.
-
FinGAN: Generative Adversarial Network for Analytical Customer Relationship Management in Banking and Insurance
Authors:
Prateek Kate,
Vadlamani Ravi,
Akhilesh Gangwar
Abstract:
Churn prediction in credit cards, fraud detection in insurance, and loan default prediction are important analytical customer relationship management (ACRM) problems. Since frauds, churns and defaults happen less frequently, the datasets for these problems turn out to be naturally highly unbalanced. Consequently, all supervised machine learning classifiers tend to yield substantial false-positive…
▽ More
Churn prediction in credit cards, fraud detection in insurance, and loan default prediction are important analytical customer relationship management (ACRM) problems. Since frauds, churns and defaults happen less frequently, the datasets for these problems turn out to be naturally highly unbalanced. Consequently, all supervised machine learning classifiers tend to yield substantial false-positive rates when trained on such unbalanced datasets. We propose two ways of data balancing. In the first, we propose an oversampling method to generate synthetic samples of minority class using Generative Adversarial Network (GAN). We employ Vanilla GAN [1], Wasserstein GAN [2] and CTGAN [3] separately to oversample the minority class samples. In order to assess the efficacy of our proposed approach, we use a host of machine learning classifiers, including Random Forest, Decision Tree, support vector machine (SVM), and Logistic Regression on the data balanced by GANs. In the second method, we introduce a hybrid method to handle data imbalance. In this second way, we utilize the power of undersampling and over-sampling together by augmenting the synthetic minority class data oversampled by GAN with the undersampled majority class data obtained by one-class support vigor machine (OCSVM) [4]. We combine both over-sampled data generated by GAN and the data under-sampled by OCSVM [4] and pass the resultant data to classifiers. When we compared our results to those of Farquad et al. [5], Sundarkumar, Ravi, and Siddeshwar [6], our proposed methods outperform the previous results in terms of the area under the ROC curve (AUC) on all datasets.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
Multi-Image Visual Question Answering
Authors:
Harsh Raj,
Janhavi Dadhania,
Akhilesh Bhardwaj,
Prabuchandran KJ
Abstract:
While a lot of work has been done on developing models to tackle the problem of Visual Question Answering, the ability of these models to relate the question to the image features still remain less explored. We present an empirical study of different feature extraction methods with different loss functions. We propose New dataset for the task of Visual Question Answering with multiple image inputs…
▽ More
While a lot of work has been done on developing models to tackle the problem of Visual Question Answering, the ability of these models to relate the question to the image features still remain less explored. We present an empirical study of different feature extraction methods with different loss functions. We propose New dataset for the task of Visual Question Answering with multiple image inputs having only one ground truth, and benchmark our results on them. Our final model utilising Resnet + RCNN image features and Bert embeddings, inspired from stacked attention network gives 39% word accuracy and 99% image accuracy on CLEVER+TinyImagenet dataset.
△ Less
Submitted 6 February, 2022; v1 submitted 27 December, 2021;
originally announced December 2021.
-
CryoCiM: Cryogenic Compute-in-Memory based on the Quantum Anomalous Hall Effect
Authors:
Shamiul Alam,
Md Mazharul Islam,
Md Shafayat Hossain,
Akhilesh Jaiswal,
Ahmedullah Aziz
Abstract:
The scaling of the already-matured CMOS technology is steadily approaching its physical limit, motivating the quest for a suitable alternative. Cryogenic operation offers a promising pathway towards continued improvement in computing speed and energy efficiency without aggressive scaling. However, the memory wall bottleneck of the traditional von-Neumann architecture persists even at cryogenic tem…
▽ More
The scaling of the already-matured CMOS technology is steadily approaching its physical limit, motivating the quest for a suitable alternative. Cryogenic operation offers a promising pathway towards continued improvement in computing speed and energy efficiency without aggressive scaling. However, the memory wall bottleneck of the traditional von-Neumann architecture persists even at cryogenic temperature. That is where a compute-in-memory (CiM) architecture, that embeds computing within the memory unit, comes into play. Computations within the memory unit help reduce the expensive data transfer between the memory and the computing units. Therefore, CiM provides extreme energy efficiency that can enable lower cooling cost at cryogenic temperature. In this work, we demonstrate CryoCiM, a cryogenic compute-in-memory framework utilizing a non-volatile memory system based on the quantum anomalous Hall effect (QAHE). Our design can perform memory read/write, and universal binary logic operations (NAND, NOR, and XOR). We design a novel peripheral circuit assembly that can perform the read/write, and single-cycle in-memory logic operations. The utilization of a QAHE-based memory system promises robustness against process variations, through the usage of topologically protected resistive states for data storage. CryoCiM is the first step towards utilizing exclusively cryogenic phenomena to serve the dual purpose of storage and computation with ultra-low power (nano-watts) operations.
△ Less
Submitted 21 March, 2022; v1 submitted 30 November, 2021;
originally announced December 2021.
-
Metagenome2Vec: Building Contextualized Representations for Scalable Metagenome Analysis
Authors:
Sathyanarayanan N. Aakur,
Vineela Indla,
Vennela Indla,
Sai Narayanan,
Arunkumar Bagavathi,
Vishalini Laguduva Ramnath,
Akhilesh Ramachandran
Abstract:
Advances in next-generation metagenome sequencing have the potential to revolutionize the point-of-care diagnosis of novel pathogen infections, which could help prevent potential widespread transmission of diseases. Given the high volume of metagenome sequences, there is a need for scalable frameworks to analyze and segment metagenome sequences from clinical samples, which can be highly imbalanced…
▽ More
Advances in next-generation metagenome sequencing have the potential to revolutionize the point-of-care diagnosis of novel pathogen infections, which could help prevent potential widespread transmission of diseases. Given the high volume of metagenome sequences, there is a need for scalable frameworks to analyze and segment metagenome sequences from clinical samples, which can be highly imbalanced. There is an increased need for learning robust representations from metagenome reads since pathogens within a family can have highly similar genome structures (some more than 90%) and hence enable the segmentation and identification of novel pathogen sequences with limited labeled data. In this work, we propose Metagenome2Vec - a contextualized representation that captures the global structural properties inherent in metagenome data and local contextualized properties through self-supervised representation learning. We show that the learned representations can help detect six (6) related pathogens from clinical samples with less than 100 labeled sequences. Extensive experiments on simulated and clinical metagenome data show that the proposed representation encodes compositional properties that can generalize beyond annotations to segment novel pathogens in an unsupervised setting.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.