Search | arXiv e-print repository

Towards Physics-informed Diffusion for Anomaly Detection in Trajectories

Authors: Arun Sharma, Mingzhou Yang, Majid Farhadloo, Subhankar Ghosh, Bharat Jayaprakash, Shashi Shekhar

Abstract: Given trajectory data, a domain-specific study area, and a user-defined threshold, we aim to find anomalous trajectories indicative of possible GPS spoofing (e.g., fake trajectory). The problem is societally important to curb illegal activities in international waters, such as unauthorized fishing and illicit oil transfers. The problem is challenging due to advances in AI generated in deep fakes g… ▽ More Given trajectory data, a domain-specific study area, and a user-defined threshold, we aim to find anomalous trajectories indicative of possible GPS spoofing (e.g., fake trajectory). The problem is societally important to curb illegal activities in international waters, such as unauthorized fishing and illicit oil transfers. The problem is challenging due to advances in AI generated in deep fakes generation (e.g., additive noise, fake trajectories) and lack of adequate amount of labeled samples for ground-truth verification. Recent literature shows promising results for anomalous trajectory detection using generative models despite data sparsity. However, they do not consider fine-scale spatiotemporal dependencies and prior physical knowledge, resulting in higher false-positive rates. To address these limitations, we propose a physics-informed diffusion model that integrates kinematic constraints to identify trajectories that do not adhere to physical laws. Experimental results on real-world datasets in the maritime and urban domains show that the proposed framework results in higher prediction accuracy and lower estimation error rate for anomaly detection and trajectory generation methods, respectively. Our implementation is available at https://github.com/arunshar/Physics-Informed-Diffusion-Probabilistic-Model. △ Less

Submitted 14 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

arXiv:2506.06773 [pdf, ps, other]

Taming Wild Branches: Overcoming Hard-to-Predict Branches using the Bullseye Predictor

Authors: Emet Behrendt, Shing Wai Pun, Prashant J. Nair

Abstract: Branch prediction is key to the performance of out-of-order processors. While the CBP-2016 winner TAGE-SC-L combines geometric-history tables, a statistical corrector, and a loop predictor, over half of its remaining mispredictions stem from a small set of hard-to-predict (H2P) branches. These branches occur under diverse global histories, causing repeated thrashing in TAGE and eviction before use… ▽ More Branch prediction is key to the performance of out-of-order processors. While the CBP-2016 winner TAGE-SC-L combines geometric-history tables, a statistical corrector, and a loop predictor, over half of its remaining mispredictions stem from a small set of hard-to-predict (H2P) branches. These branches occur under diverse global histories, causing repeated thrashing in TAGE and eviction before usefulness counters can mature. Prior work shows that simply enlarging the tables offers only marginal improvement. We augment a 159 KB TAGE-SC-L predictor with a 28 KB H2P-targeted subsystem called the Bullseye predictor. It identifies problematic PCs using a set-associative H2P Identification Table (HIT) and steers them to one of two branch-specific perceptrons, one indexed by hashed local history and the other by folded global history. A short trial phase tracks head-to-head accuracy in an H2P cache. A branch becomes perceptron-resident only if the perceptron's sustained accuracy and output magnitude exceed dynamic thresholds, after which TAGE updates for that PC are suppressed to reduce pollution. The HIT, cache, and perceptron operate fully in parallel with TAGE-SC-L, providing higher fidelity on the H2P tail. This achieves an average MPKI of 3.4045 and CycWpPKI of 145.09. △ Less

Submitted 7 June, 2025; originally announced June 2025.

Comments: Paper accepted and presented at the 6th Championship Branch Prediction (CBP) workshop, co-held with ISCA 2025, on June 21, 2025, Tokyo, Japan

ACM Class: C.1.2; B.2.1; C.4; C.0

arXiv:2504.07048 [pdf, other]

Context Switching for Secure Multi-programming of Near-Term Quantum Computers

Authors: Avinash Kumar, Meng Wang, Chenxu Liu, Ang Li, Prashant J. Nair, Poulami Das

Abstract: Multi-programming quantum computers improve device utilization and throughput. However, crosstalk from concurrent two-qubit CNOT gates poses security risks, compromising the fidelity and output of co-running victim programs. We design Zero Knowledge Tampering Attacks (ZKTAs), using which attackers can exploit crosstalk without knowledge of the hardware error profile. ZKTAs can alter victim program… ▽ More Multi-programming quantum computers improve device utilization and throughput. However, crosstalk from concurrent two-qubit CNOT gates poses security risks, compromising the fidelity and output of co-running victim programs. We design Zero Knowledge Tampering Attacks (ZKTAs), using which attackers can exploit crosstalk without knowledge of the hardware error profile. ZKTAs can alter victim program outputs in 40% of cases on commercial systems. We identify that ZKTAs succeed because the attacker's program consistently runs with the same victim program in a fixed context. To mitigate this, we propose QONTEXTS: a context-switching technique that defends against ZKTAs by running programs across multiple contexts, each handling only a subset of trials. QONTEXTS uses multi-programming with frequent context switching while identifying a unique set of programs for each context. This helps limit only a fraction of execution to ZKTAs. We enhance QONTEXTS with attack detection capabilities that compare the distributions from different contexts against each other to identify noisy contexts executed with ZKTAs. Our evaluations on real IBMQ systems show that QONTEXTS increases program resilience by three orders of magnitude and fidelity by 1.33$\times$ on average. Moreover, QONTEXTS improves throughput by 2$\times$, advancing security in multi-programmed environments. △ Less

Submitted 17 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

arXiv:2503.05648 [pdf, other]

Physics-based machine learning framework for predicting NOx emissions from compression ignition engines using on-board diagnostics data

Authors: Harish Panneer Selvam, Bharat Jayaprakash, Yan Li, Shashi Shekhar, William F. Northrop

Abstract: This work presents a physics-based machine learning framework to predict and analyze oxides of nitrogen (NOx) emissions from compression-ignition engine-powered vehicles using on-board diagnostics (OBD) data as input. Accurate NOx prediction from OBD datasets is difficult because NOx formation inside an engine combustion chamber is governed by complex processes occurring on timescales much shorter… ▽ More This work presents a physics-based machine learning framework to predict and analyze oxides of nitrogen (NOx) emissions from compression-ignition engine-powered vehicles using on-board diagnostics (OBD) data as input. Accurate NOx prediction from OBD datasets is difficult because NOx formation inside an engine combustion chamber is governed by complex processes occurring on timescales much shorter than the data collection rate. Thus, emissions generally cannot be predicted accurately using simple empirically derived physics models. Black box models like genetic algorithms or neural networks can be more accurate, but have poor interpretability. The transparent model presented in this paper has both high accuracy and can explain potential sources of high emissions. The proposed framework consists of two major steps: a physics-based NOx prediction model combined with a novel Divergent Window Co-occurrence (DWC) Pattern detection algorithm to analyze operating conditions that are not adequately addressed by the physics-based model. The proposed framework is validated for generalizability with a second vehicle OBD dataset, a sensitivity analysis is performed, and model predictions are compared with that from a deep neural network. The results show that NOx emissions predictions using the proposed model has around 55% better root mean square error, and around 60% higher mean absolute error compared to the baseline NOx prediction model from previously published work. The DWC Pattern Detection Algorithm identified low engine power conditions to have high statistical significance, indicating an operating regime where the model can be improved. This work shows that the physics-based machine learning framework is a viable method for predicting NOx emissions from engines that do not incorporate NOx sensing. △ Less

Submitted 7 March, 2025; originally announced March 2025.

arXiv:2503.00979 [pdf, ps, other]

Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs

Authors: Ravi Ghadia, Avinash Kumar, Gaurav Jain, Prashant Nair, Poulami Das

Abstract: Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications -- such as chatbots and interactive assistants -- where low latency and high memory efficiency are critical. Existing… ▽ More Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications -- such as chatbots and interactive assistants -- where low latency and high memory efficiency are critical. Existing methods drop distant tokens or compress states in a lossy manner, sacrificing accuracy by discarding vital context or introducing bias. We propose MorphKV, an inference-time technique that maintains a constant-sized KV cache while preserving accuracy. MorphKV balances long-range dependencies and local coherence during text generation. It eliminates early-token bias while retaining high-fidelity context by adaptively ranking tokens through correlation-aware selection. Unlike heuristic retention or lossy compression, MorphKV iteratively refines the KV cache via lightweight updates guided by attention patterns of recent tokens. This approach captures inter-token correlation with greater accuracy, crucial for tasks like content creation and code generation. Our studies on long-response tasks show 52.9$\%$ memory savings and 18.2$\%$ higher accuracy on average compared to state-of-the-art prior works, enabling efficient real-world deployment. △ Less

Submitted 7 June, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

Comments: Published in the Proceedings of the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada

arXiv:2502.15013 [pdf, other]

Towards Physics-Guided Foundation Models

Authors: Majid Farhadloo, Arun Sharma, Mingzhou Yang, Bharat Jayaprakash, William Northrop, Shashi Shekhar

Abstract: Traditional foundation models are pre-trained on broad datasets to reduce the training resources (e.g., time, energy, labeled samples) needed for fine-tuning a wide range of downstream tasks. However, traditional foundation models struggle with out-of-distribution prediction and can produce outputs that are unrealistic and physically infeasible. We propose the notation of physics-guided foundation… ▽ More Traditional foundation models are pre-trained on broad datasets to reduce the training resources (e.g., time, energy, labeled samples) needed for fine-tuning a wide range of downstream tasks. However, traditional foundation models struggle with out-of-distribution prediction and can produce outputs that are unrealistic and physically infeasible. We propose the notation of physics-guided foundation models (PGFM), that is, foundation models integrated with broad or general domain (e.g., scientific) physical knowledge applicable to a wide range of downstream tasks. △ Less

Submitted 23 April, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

arXiv:2501.18861 [pdf, other]

doi 10.1109/HPCA61900.2025.00080

QPRAC: Towards Secure and Practical PRAC-based Rowhammer Mitigation using Priority Queues

Authors: Jeonghyun Woo, Chris S. Lin, Prashant J. Nair, Aamer Jaleel, Gururaj Saileshwar

Abstract: JEDEC has introduced the Per Row Activation Counting (PRAC) framework for DDR5 and future DRAMs to enable precise counting of DRAM row activations. PRAC enables a holistic mitigation of Rowhammer attacks even at ultra-low Rowhammer thresholds. PRAC uses an Alert Back-Off (ABO) protocol to request the memory controller to issue Rowhammer mitigation requests. However, recent PRAC implementations are… ▽ More JEDEC has introduced the Per Row Activation Counting (PRAC) framework for DDR5 and future DRAMs to enable precise counting of DRAM row activations. PRAC enables a holistic mitigation of Rowhammer attacks even at ultra-low Rowhammer thresholds. PRAC uses an Alert Back-Off (ABO) protocol to request the memory controller to issue Rowhammer mitigation requests. However, recent PRAC implementations are either insecure or impractical. For example, Panopticon, the inspiration for PRAC, is rendered insecure if implemented per JEDEC's PRAC specification. On the other hand, the recent UPRAC proposal is impractical since it needs oracular knowledge of the `top-N' activated DRAM rows that require mitigation. This paper provides the first secure, scalable, and practical RowHammer solution using the PRAC framework. The crux of our proposal is the design of a priority-based service queue (PSQ) for mitigations that prioritizes pending mitigations based on activation counts to avoid the security risks of prior solutions. This provides principled security using the reactive ABO protocol. Furthermore, we co-design our PSQ, with opportunistic mitigation on Refresh Management (RFM) operations and proactive mitigation during refresh (REF), to limit the performance impact of ABO-based mitigations. QPRAC provides secure and practical RowHammer mitigation that scales to Rowhammer thresholds as low as 71 while incurring a 0.8% slowdown for benign workloads, which further reduces to 0% with proactive mitigations. △ Less

Submitted 15 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

Comments: 15 pages, including appendices. The paper was presented at HPCA 2025 (https://hpca-conf.org/2025/)

Journal ref: 2025 IEEE Symposium on High-Performance Computer Architecture (HPCA 2025)

arXiv:2501.18857 [pdf, other]

doi 10.1109/HPCA61900.2025.00079

DAPPER: A Performance-Attack-Resilient Tracker for RowHammer Defense

Authors: Jeonghyun Woo, Prashant J. Nair

Abstract: RowHammer vulnerabilities pose a significant threat to modern DRAM-based systems, where rapid activation of DRAM rows can induce bit-flips in neighboring rows. To mitigate this, state-of-the-art host-side RowHammer mitigations typically rely on shared counters or tracking structures. While these optimizations benefit benign applications, they are vulnerable to Performance Attacks (Perf-Attacks), w… ▽ More RowHammer vulnerabilities pose a significant threat to modern DRAM-based systems, where rapid activation of DRAM rows can induce bit-flips in neighboring rows. To mitigate this, state-of-the-art host-side RowHammer mitigations typically rely on shared counters or tracking structures. While these optimizations benefit benign applications, they are vulnerable to Performance Attacks (Perf-Attacks), where adversaries exploit shared structures to reduce DRAM bandwidth for co-running benign applications by increasing DRAM accesses for RowHammer counters or triggering repetitive refreshes required for the early reset of structures, significantly degrading performance. In this paper, we propose secure hashing mechanisms to thwart adversarial attempts to capture the mapping of shared structures. We propose DAPPER, a novel low-cost tracker resilient to Perf-Attacks even at ultra-low RowHammer thresholds. We first present a secure hashing template in the form of DAPPER-S. We then develop DAPPER-H, an enhanced version of DAPPER-S, incorporating double-hashing, novel reset strategies, and mitigative refresh techniques. Our security analysis demonstrates the effectiveness of DAPPER-H against both RowHammer and Perf-Attacks. Experiments with 57 workloads from SPEC2006, SPEC2017, TPC, Hadoop, MediaBench, and YCSB show that, even at an ultra-low RowHammer threshold of 500, DAPPER-H incurs only a 0.9% slowdown in the presence of Perf-Attacks while using only 96KB of SRAM per 32GB of DRAM memory. △ Less

Submitted 15 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

Comments: The initial version of this paper was submitted to MICRO 2024 on April 18, 2024. The final version was presented at HPCA 2025 (https://hpca-conf.org/2025) and is 16 pages long, including references

Journal ref: 2025 IEEE Symposium on High-Performance Computer Architecture (HPCA 2025)

arXiv:2501.16762 [pdf, other]

Rate-Distortion under Neural Tracking of Speech: A Directed Redundancy Approach

Authors: Jan Østergaard, Sangeeth Geetha Jayaprakash, Rodrigo Ordoñez

Abstract: The data acquired at different scalp EEG electrodes when human subjects are exposed to speech stimuli are highly redundant. The redundancy is partly due to volume conduction effects and partly due to localized regions of the brain synchronizing their activity in response to the stimuli. In a competing talker scenario, we use a recent measure of directed redundancy to assess the amount of redundant… ▽ More The data acquired at different scalp EEG electrodes when human subjects are exposed to speech stimuli are highly redundant. The redundancy is partly due to volume conduction effects and partly due to localized regions of the brain synchronizing their activity in response to the stimuli. In a competing talker scenario, we use a recent measure of directed redundancy to assess the amount of redundant information that is causally conveyed from the attended stimuli to the left temporal region of the brain. We observe that for the attended stimuli, the transfer entropy as well as the directed redundancy is proportional to the correlation between the speech stimuli and the reconstructed signal from the EEG signals. This demonstrates that both the rate as well as the rate-redundancy are inversely proportional to the distortion in neural speech tracking. Thus, a greater rate indicates a greater redundancy between the electrode signals, and a greater correlation between the reconstructed signal and the attended stimuli. A similar relationship is not observed for the distracting stimuli. △ Less

Submitted 28 January, 2025; originally announced January 2025.

Comments: Accepted for IEEE Data Compression Conference

arXiv:2412.05649 [pdf, other]

RouteNet-Fermi: Network Modeling With GNN (Analysis And Re-implementation)

Authors: Shourya Verma, Simran Kadadi, Swathi Jayaprakash, Arpan Kumar Mahapatra, Ishaan Jain

Abstract: Network performance modeling presents important challenges in modern computer networks due to increasing complexity, scale, and diverse traffic patterns. While traditional approaches like queuing theory and packet-level simulation have served as foundational tools, they face limitations in modeling complex traffic behaviors and scaling to large networks. This project presents an extended implement… ▽ More Network performance modeling presents important challenges in modern computer networks due to increasing complexity, scale, and diverse traffic patterns. While traditional approaches like queuing theory and packet-level simulation have served as foundational tools, they face limitations in modeling complex traffic behaviors and scaling to large networks. This project presents an extended implementation of RouteNet-Fermi, a Graph Neural Network (GNN) architecture designed for network performance prediction, with additional recurrent neural network variants. We improve the the original architecture by implementing Long Short-Term Memory (LSTM) cells and Recurrent Neural Network (RNN) cells alongside the existing Gated Recurrent Unit (GRU) cells implementation. This work contributes to the understanding of recurrent neural architectures in GNN-based network modeling and provides a flexible framework for future experimentation with different cell types. △ Less

Submitted 7 December, 2024; originally announced December 2024.

arXiv:2412.03853 [pdf, other]

Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer

Authors: Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado

Abstract: Transforming mathematical expressions into LaTeX poses a significant challenge. In this paper, we examine the application of advanced transformer-based architectures to address the task of converting handwritten or digital mathematical expression images into corresponding LaTeX code. As a baseline, we utilize the current state-of-the-art CNN encoder and LSTM decoder. Additionally, we explore enhan… ▽ More Transforming mathematical expressions into LaTeX poses a significant challenge. In this paper, we examine the application of advanced transformer-based architectures to address the task of converting handwritten or digital mathematical expression images into corresponding LaTeX code. As a baseline, we utilize the current state-of-the-art CNN encoder and LSTM decoder. Additionally, we explore enhancements to the CNN-RNN architecture by replacing the CNN encoder with the pretrained ResNet50 model with modification to suite the grey scale input. Further, we experiment with vision transformer model and compare with Baseline and CNN-LSTM model. Our findings reveal that the vision transformer architectures outperform the baseline CNN-RNN framework, delivering higher overall accuracy and BLEU scores while achieving lower Levenshtein distances. Moreover, these results highlight the potential for further improvement through fine-tuning of model parameters. To encourage open research, we also provide the model implementation, enabling reproduction of our results and facilitating further research in this domain. △ Less

Submitted 7 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

Comments: 7 pages; 3 figures

arXiv:2406.00013 [pdf]

Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval

Authors: Jayaprakash Sundararaj

Abstract: Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we prese… ▽ More Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we present our work on an recommending images for enhancing a substantial amount of existing plain text news articles. We use probabilistic models and word similarity heuristics to generate captions and extract Key-phrases which are re-ranked using a rank aggregation framework with relevance feedback mechanism. We show that such rank aggregation and relevant feedback which are typically used in Tagging Documents, Text Information Retrieval also helps in improving image retrieval. These queries are fed to the Yahoo Search Engine to obtain relevant images 1. Our proposed method is observed to perform better than all existing baselines. Additonally, We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. However, it is not easy to detect sentiment and simultaneously extract summary. The two tasks conflict in the sense that the demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. However, using submodularity we show how to strike a balance between the two requirements. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score. We also compare the performances of the proposed submodular functions. △ Less

Submitted 20 May, 2024; originally announced June 2024.

arXiv:2404.04270 [pdf, other]

Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings

Authors: Yassaman Ebrahimzadeh Maboud, Muhammad Adnan, Divya Mahajan, Prashant J. Nair

Abstract: Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variatio… ▽ More Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variation, resulting in saturation. Consequently, updates to these embeddings lack any contribution to model quality. This paper presents Slipstream, a software framework that identifies stale embeddings on the fly and skips their updates to enhance performance. This capability enables Slipstream to achieve substantial speedup, optimize CPU-GPU bandwidth usage, and eliminate unnecessary memory access. SlipStream showcases training time reductions of 2x, 2.4x, 1.2x, and 1.175x across real-world datasets and configurations, compared to Baseline XDL, Intel-optimized DRLM, FAE, and Hotline, respectively. △ Less

Submitted 21 March, 2024; originally announced April 2024.

arXiv:2403.09054 [pdf, other]

Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference

Authors: Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant J. Nair, Ilya Soloveychik, Purushotham Kamath

Abstract: Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which constitutes the majority of the computational workload, primarily entails vector-matrix multiplications and interactions with the Key-Value (KV) Cache. This phas… ▽ More Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which constitutes the majority of the computational workload, primarily entails vector-matrix multiplications and interactions with the Key-Value (KV) Cache. This phase is constrained by memory bandwidth due to the overhead of transferring weights and KV cache values from the memory system to the computing units. This memory bottleneck becomes particularly pronounced in applications that require long-context and extensive text generation, both of which are increasingly crucial for LLMs. This paper introduces "Keyformer", an innovative inference-time approach, to mitigate the challenges associated with KV cache size and memory bandwidth utilization. Keyformer leverages the observation that approximately 90% of the attention weight in generative inference focuses on a specific subset of tokens, referred to as "key" tokens. Keyformer retains only the key tokens in the KV cache by identifying these crucial tokens using a novel score function. This approach effectively reduces both the KV cache size and memory bandwidth usage without compromising model accuracy. We evaluate Keyformer's performance across three foundational models: GPT-J, Cerebras-GPT, and MPT, which employ various positional embedding algorithms. Our assessment encompasses a variety of tasks, with a particular emphasis on summarization and conversation tasks involving extended contexts. Keyformer's reduction of KV cache reduces inference latency by 2.1x and improves token generation throughput by 2.4x, while preserving the model's accuracy. △ Less

Submitted 5 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

MSC Class: 68U35 ACM Class: I.2.7; C.0

Journal ref: Proceedings of the 7th Annual Conference on Machine Learning and Systems (MLSys), 2024

arXiv:2402.00197 [pdf, other]

doi 10.1021/acs.est.3c06447

Determination of Trace Organic Contaminant Concentration via Machine Classification of Surface-Enhanced Raman Spectra

Authors: Vishnu Jayaprakash, Jae Bem You, Chiranjeevi Kanike, Jinfeng Liu, Christopher McCallum, Xuehua Zhang

Abstract: Accurate detection and analysis of traces of persistent organic pollutants in water is important in many areas, including environmental monitoring and food quality control, due to their long environmental stability and potential bioaccumulation. While conventional analysis of organic pollutants requires expensive equipment, surface enhanced Raman spectroscopy (SERS) has demonstrated great potentia… ▽ More Accurate detection and analysis of traces of persistent organic pollutants in water is important in many areas, including environmental monitoring and food quality control, due to their long environmental stability and potential bioaccumulation. While conventional analysis of organic pollutants requires expensive equipment, surface enhanced Raman spectroscopy (SERS) has demonstrated great potential for accurate detection of these contaminants. However, SERS analytical difficulties, such as spectral preprocessing, denoising, and substrate-based spectral variation, have hindered widespread use of the technique. Here, we demonstrate an approach for predicting the concentration of sample pollutants from messy, unprocessed Raman data using machine learning. Frequency domain transform methods, including the Fourier and Walsh Hadamard transforms, are applied to sets of Raman spectra of three model micropollutants in water (rhodamine 6G, chlorpyrifos, and triclosan), which are then used to train machine learning algorithms. Using standard machine learning models, the concentration of sample pollutants are predicted with more than 80 percent cross-validation accuracy from raw Raman data. cross-validation accuracy of 85 percent was achieved using deep learning for a moderately sized dataset (100 spectra), and 70 to 80 percent cross-validation accuracy was achieved even for very small datasets (50 spectra). Additionally, standard models were shown to accurately identify characteristic peaks via analysis of their importance scores. The approach shown here has the potential to be applied to facilitate accurate detection and analysis of persistent organic pollutants by surface-enhanced Raman spectroscopy. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2310.01301 [pdf, ps, other]

Short Time Angular Impulse Response of Rayleigh Beams

Authors: Bidhayak Goswami, K. R. Jayaprakash, Anindya Chatterjee

Abstract: In the dynamics of linear structures, the impulse response function is of fundamental interest. In some cases one examines the short term response wherein the disturbance is still local and the boundaries have not yet come into play, and for such short-time analysis the geometrical extent of the structure may be taken as unbounded. Here we examine the response of slender beams to angular impulses.… ▽ More In the dynamics of linear structures, the impulse response function is of fundamental interest. In some cases one examines the short term response wherein the disturbance is still local and the boundaries have not yet come into play, and for such short-time analysis the geometrical extent of the structure may be taken as unbounded. Here we examine the response of slender beams to angular impulses. The Euler-Bernoulli model, which does not include rotary inertia of cross sections, predicts an unphysical and unbounded initial rotation at the point of application. A finite length Euler-Bernoulli beam, when modelled using finite elements, predicts a mesh-dependent response that shows fast large-amplitude oscillations setting in very quickly. The simplest introduction of rotary inertia yields the Rayleigh beam model, which has more reasonable behaviour including a finite wave speed at all frequencies. If a Rayleigh beam is given an impulsive moment at a location away from its boundaries, then the predicted behaviour has an instantaneous finite jump in local slope or rotation, followed by smooth evolution of the slope for a finite time interval until reflections arrive from the boundary, causing subsequent slope discontinuities in time. We present a detailed study of the angular impulse response of a simply supported Rayleigh beam, starting with dimensional analysis, followed by modal expansion including all natural frequencies, culminating with an asymptotic formula for the short-time response. The asymptotic formula is obtained by breaking the series solution into two parts to be treated independently term by term, and leads to a polynomial in time. The polynomial matches the response from refined finite element (FE) simulations. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2308.14902 [pdf, other]

Ad-Rec: Advanced Feature Interactions to Address Covariate-Shifts in Recommendation Networks

Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

Abstract: Recommendation models are vital in delivering personalized user experiences by leveraging the correlation between multiple input features. However, deep learning-based recommendation models often face challenges due to evolving user behaviour and item features, leading to covariate shifts. Effective cross-feature learning is crucial to handle data distribution drift and adapting to changing user b… ▽ More Recommendation models are vital in delivering personalized user experiences by leveraging the correlation between multiple input features. However, deep learning-based recommendation models often face challenges due to evolving user behaviour and item features, leading to covariate shifts. Effective cross-feature learning is crucial to handle data distribution drift and adapting to changing user behaviour. Traditional feature interaction techniques have limitations in achieving optimal performance in this context. This work introduces Ad-Rec, an advanced network that leverages feature interaction techniques to address covariate shifts. This helps eliminate irrelevant interactions in recommendation tasks. Ad-Rec leverages masked transformers to enable the learning of higher-order cross-features while mitigating the impact of data distribution drift. Our approach improves model quality, accelerates convergence, and reduces training time, as measured by the Area Under Curve (AUC) metric. We demonstrate the scalability of Ad-Rec and its ability to achieve superior model quality through comprehensive ablation studies. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2307.02623 [pdf, other]

FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout

Authors: Irene Wang, Prashant J. Nair, Divya Mahajan

Abstract: Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall traini… ▽ More Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach. △ Less

Submitted 26 September, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

arXiv:2304.13807 [pdf, other]

A Survey on Solving and Discovering Differential Equations Using Deep Neural Networks

Authors: Hyeonjung, Jung, Jayant Gupta, Bharat Jayaprakash, Matthew Eagon, Harish Panneer Selvam, Carl Molnar, William Northrop, Shashi Shekhar

Abstract: Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and tran… ▽ More Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation. △ Less

Submitted 19 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Under review for ACM Computing Surveys journal. 29 pages

arXiv:2212.12613 [pdf, other]

doi 10.1109/HPCA56546.2023.10070999

Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory Systems

Authors: Jeonghyun Woo, Gururaj Saileshwar, Prashant J. Nair

Abstract: As Dynamic Random Access Memories (DRAM) scale, they are becoming increasingly susceptible to Row Hammer. By rapidly activating rows of DRAM cells (aggressor rows), attackers can exploit inter-cell interference through Row Hammer to flip bits in neighboring rows (victim rows). A recent work, called Randomized Row-Swap (RRS), proposed proactively swapping aggressor rows with randomly selected rows… ▽ More As Dynamic Random Access Memories (DRAM) scale, they are becoming increasingly susceptible to Row Hammer. By rapidly activating rows of DRAM cells (aggressor rows), attackers can exploit inter-cell interference through Row Hammer to flip bits in neighboring rows (victim rows). A recent work, called Randomized Row-Swap (RRS), proposed proactively swapping aggressor rows with randomly selected rows before an aggressor row can cause Row Hammer. Our paper observes that RRS is neither secure nor scalable. We first propose the `Juggernaut attack pattern' that breaks RRS in under 1 day. Juggernaut exploits the fact that the mitigative action of RRS, a swap operation, can itself induce additional target row activations, defeating such a defense. Second, this paper proposes a new defense Secure Row-Swap mechanism that avoids the additional activations from swap (and unswap) operations and protects against Juggernaut. Furthermore, this paper extends Secure Row-Swap with attack detection to defend against even future attacks. While this provides better security, it also allows for securely reducing the frequency of swaps, thereby enabling Scalable and Secure Row-Swap. The Scalable and Secure Row-Swap mechanism provides years of Row Hammer protection with 3.3X lower storage overheads as compared to the RRS design. It incurs only a 0.7% slowdown as compared to a not-secure baseline for a Row Hammer threshold of 1200. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Journal ref: The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022)

arXiv:2207.10793 [pdf, other]

doi 10.1145/3630614.3630616

The Dirty Secret of SSDs: Embodied Carbon

Authors: Swamit Tannu, Prashant J. Nair

Abstract: Scalable Solid-State Drives (SSDs) have ushered in a transformative era in data storage and accessibility, spanning both data centers and portable devices. However, the strides made in scaling this technology can bear significant environmental consequences. On a global scale, a notable portion of semiconductor manufacturing relies on electricity derived from coal and natural gas sources. A strikin… ▽ More Scalable Solid-State Drives (SSDs) have ushered in a transformative era in data storage and accessibility, spanning both data centers and portable devices. However, the strides made in scaling this technology can bear significant environmental consequences. On a global scale, a notable portion of semiconductor manufacturing relies on electricity derived from coal and natural gas sources. A striking example of this is the manufacturing process for a single Gigabyte of Flash memory, which emits approximately 0.16 Kg of CO2 - a considerable fraction of the total carbon emissions attributed to the system. Remarkably, the manufacturing of storage devices alone contributed to an estimated 20 million metric tonnes of CO2 emissions in the year 2021. In light of these environmental concerns, this paper delves into an analysis of the sustainability trade-offs inherent in Solid-State Drives (SSDs) when compared to traditional Hard Disk Drives (HDDs). Moreover, this study proposes methodologies to gauge the embodied carbon costs associated with storage systems effectively. The research encompasses four key strategies to enhance the sustainability of storage systems. In summation, this paper critically addresses the embodied carbon issues associated with SSDs, comparing them with HDDs, and proposes a comprehensive framework of strategies to enhance the sustainability of storage systems. △ Less

Submitted 28 September, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Journal ref: Energy Informatics Review (Volume 3 Issue 3, October 2023)

arXiv:2204.05436 [pdf, other]

Heterogeneous Acceleration Pipeline for Recommendation System Training

Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

Abstract: Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer tim… ▽ More Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer time. In contrast, the GPU-only mode utilizes High Bandwidth Memory (HBM) across multiple GPUs for storing embedding tables. However, this approach is expensive and presents scaling concerns. This paper introduces Hotline, a heterogeneous acceleration pipeline that addresses these concerns. Hotline develops a data-aware and model-aware scheduling pipeline by leveraging the insight that only a few embedding entries are frequently accessed (popular). This approach utilizes CPU main memory for non-popular embeddings and GPUs' HBM for popular embeddings. To achieve this, Hotline accelerator fragments a mini-batch into popular and non-popular micro-batches. It gathers the necessary working parameters for non-popular micro-batches from the CPU, while GPUs execute popular micro-batches. The hardware accelerator dynamically coordinates the execution of popular embeddings on GPUs and non-popular embeddings from the CPU's main memory. Real-world datasets and models confirm Hotline's effectiveness, reducing average end-to-end training time by 2.2x compared to Intel-optimized CPU-GPU DLRM baseline. △ Less

Submitted 28 April, 2024; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted at The International Symposium on Computer Architecture (ISCA), 2024

arXiv:2203.13892 [pdf, other]

doi 10.1145/3695053.3730992

Accelerating Simulation of Quantum Circuits under Noise via Computational Reuse

Authors: Meng Wang, Swamit Tannu, Prashant J. Nair

Abstract: To realize the full potential of quantum computers, we must mitigate qubit errors by developing noise-aware algorithms, compilers, and architectures. Thus, simulating quantum programs on high-performance computing (HPC) systems with different noise models is a de facto tool researchers use. Unfortunately, noisy simulators iteratively execute a similar circuit for thousands of trials, thereby incur… ▽ More To realize the full potential of quantum computers, we must mitigate qubit errors by developing noise-aware algorithms, compilers, and architectures. Thus, simulating quantum programs on high-performance computing (HPC) systems with different noise models is a de facto tool researchers use. Unfortunately, noisy simulators iteratively execute a similar circuit for thousands of trials, thereby incurring significant performance overheads. To address this, we propose a noisy simulation technique called Tree-Based Quantum Circuit Simulation (TQSim). TQSim exploits the reusability of intermediate results during the noisy simulation, reducing computation. TQSim dynamically partitions a circuit into several subcircuits. It then reuses the intermediate results from these subcircuits during computation. Compared to a noisy Qulacs-based baseline simulator, TQSim achieves a speedup of up to 3.89x for noisy simulations. TQSim is designed to be efficient with multi-node setups while also maintaining tight fidelity bounds. △ Less

Submitted 19 May, 2025; v1 submitted 25 March, 2022; originally announced March 2022.

Comments: Accepted for publication in the Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25). Manuscript length: 15 pages

arXiv:2111.01354 [pdf, other]

SmartKC: Smartphone-based Corneal Topographer for Keratoconus Detection

Authors: Siddhartha Gairola, Murtuza Bohra, Nadeem Shaheer, Navya Jayaprakash, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it… ▽ More Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it a leading cause for partial/complete blindness among such populations. We propose SmartKC, a low-cost, smartphone-based keratoconus diagnosis system comprising of a 3D-printed placido's disc attachment, an LED light strip, and an intelligent smartphone app to capture the reflection of the placido rings on the cornea. An image processing pipeline analyzes the corneal image and uses the smartphone's camera parameters, the placido rings' 3D location, the pixel location of the reflected placido rings and the setup's working distance to construct the corneal surface, via the Arc-Step method and Zernike polynomials based surface fitting. In a clinical study with 101 distinct eyes, we found that SmartKC achieves a sensitivity of 94.1% and a specificity of 100.0%. Moreover, the quantitative curvature estimates (sim-K) strongly correlate with a gold-standard medical device (Pearson correlation coefficient =0.78). Our results indicate that SmartKC has the potential to be used as a keratoconus screening tool under real-world medical settings. △ Less

Submitted 21 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Change Log: + Fixed sim-K computation (updated Section 5.5.3); re-ran our pipeline with the updated sim-K values (updated Figure 7); + Conducted the comparative evaluation with doctors again (total 4 doctors), and got improved results (updated Section 7.2 and Table 2); [Note: This is an updated version of the paper that was accepted for publication in IMWUT 2021.]

arXiv:2106.15034 [pdf, other]

Approximation Schemes for Capacitated Vehicle Routing on Graphs of Bounded Treewidth, Bounded Doubling, or Highway Dimension

Authors: Aditya Jayaprakash, Mohammad R. Salavatipour

Abstract: In this paper, we present Approximation Schemes for Capacitated Vehicle Routing Problem (CVRP) on several classes of graphs. In CVRP, introduced by Dantzig and Ramser (1959), we are given a graph $G=(V,E)$ with metric edges costs, a depot $r\in V$, and a vehicle of bounded capacity $Q$. The goal is to find minimum cost collection of tours for the vehicle that returns to the depot, each visiting at… ▽ More In this paper, we present Approximation Schemes for Capacitated Vehicle Routing Problem (CVRP) on several classes of graphs. In CVRP, introduced by Dantzig and Ramser (1959), we are given a graph $G=(V,E)$ with metric edges costs, a depot $r\in V$, and a vehicle of bounded capacity $Q$. The goal is to find minimum cost collection of tours for the vehicle that returns to the depot, each visiting at most $Q$ nodes, such that they cover all the nodes. This generalizes classic TSP and has been studied extensively. In the more general setting, each node $v$ has a demand $d_v$ and the total demand of each tour must be no more than $Q$. Either the demand of each node must be served by one tour (unsplittable) or can be served by multiple tour (splittable). The best known approximation algorithm for general graphs has ratio $α+2(1-ε)$ (for the unsplittable) and $α+1-ε$ (for the splittable) for some fixed $ε>\frac{1}{3000}$, where $α$ is the best approximation for TSP. Even for the case of trees, the best approximation ratio is $4/3$ by Becker (2018) and it has been an open question if there is an approximation scheme for this simple class of graphs. Das and Mathieu (2015) presented an approximation scheme with time $n^{\log^{O(1/ε)}n}$ for Euclidean plane $\mathbb{R}^2$. No other approximation scheme is known for any other class of metrics (without further restrictions on $Q$). In this paper, we make significant progress on this classic problem by presenting Quasi-Polynomial Time Approximation Schemes (QPTAS) for graphs of bounded treewidth, graphs of bounded highway dimensions, and graphs of bounded doubling dimensions. For comparison, our result implies an approximation scheme for Euclidean plane with run time $n^{O(\log^{10}n/ε^{9})}$. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2104.04598 [pdf, other]

Cross-Modal learning for Audio-Visual Video Parsing

Authors: Jatin Lamba, Abhishek, Jayaprakash Akula, Rishabh Dabral, Preethi Jyothi, Ganesh Ramakrishnan

Abstract: In this paper, we present a novel approach to the audio-visual video parsing (AVVP) task that demarcates events from a video separately for audio and visual modalities. The proposed parsing approach simultaneously detects the temporal boundaries in terms of start and end times of such events. We show how AVVP can benefit from the following techniques geared towards effective cross-modal learning:… ▽ More In this paper, we present a novel approach to the audio-visual video parsing (AVVP) task that demarcates events from a video separately for audio and visual modalities. The proposed parsing approach simultaneously detects the temporal boundaries in terms of start and end times of such events. We show how AVVP can benefit from the following techniques geared towards effective cross-modal learning: (i) adversarial training and skip connections (ii) global context aware attention and, (iii) self-supervised pretraining using an audio-video grounding objective to obtain cross-modal audio-video representations. We present extensive experimental evaluations on the Look, Listen, and Parse (LLP) dataset and show that we outperform the state-of-the-art Hybrid Attention Network (HAN) on all five metrics proposed for AVVP. We also present several ablations to validate the effect of pretraining, global attention and adversarial training. △ Less

Submitted 21 June, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: Work accepted at Interspeech 2021

arXiv:2103.05457 [pdf, other]

Rudder: A Cross Lingual Video and Text Retrieval Dataset

Authors: Jayaprakash A, Abhishek, Rishabh Dabral, Ganesh Ramakrishnan, Preethi Jyothi

Abstract: Video retrieval using natural language queries requires learning semantically meaningful joint embeddings between the text and the audio-visual input. Often, such joint embeddings are learnt using pairwise (or triplet) contrastive loss objectives which cannot give enough attention to 'difficult-to-retrieve' samples during training. This problem is especially pronounced in data-scarce settings wher… ▽ More Video retrieval using natural language queries requires learning semantically meaningful joint embeddings between the text and the audio-visual input. Often, such joint embeddings are learnt using pairwise (or triplet) contrastive loss objectives which cannot give enough attention to 'difficult-to-retrieve' samples during training. This problem is especially pronounced in data-scarce settings where the data is relatively small (10% of the large scale MSR-VTT) to cover the rather complex audio-visual embedding space. In this context, we introduce Rudder - a multilingual video-text retrieval dataset that includes audio and textual captions in Marathi, Hindi, Tamil, Kannada, Malayalam and Telugu. Furthermore, we propose to compensate for data scarcity by using domain knowledge to augment supervision. To this end, in addition to the conventional three samples of a triplet (anchor, positive, and negative), we introduce a fourth term - a partial - to define a differential margin based partialorder loss. The partials are heuristically sampled such that they semantically lie in the overlap zone between the positives and the negatives, thereby resulting in broader embedding coverage. Our proposals consistently outperform the conventional max-margin and triplet losses and improve the state-of-the-art on MSR-VTT and DiDeMO datasets. We report benchmark results on Rudder while also observing significant gains using the proposed partial order loss, especially when the language specific retrieval models are jointly trained by availing the cross-lingual alignment across the language-specific datasets. △ Less

Submitted 9 March, 2021; originally announced March 2021.

arXiv:2103.00686 [pdf, other]

doi 10.14778/3485450.3485462

Accelerating Recommendation System Training by Leveraging Popular Choices

Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

Abstract: Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation model… ▽ More Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this paper deep dives into the semantics of training data and obtains insights about the feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed with a few embedding entries being accessed up to 10000x more. This paper leverages this asymmetrical access pattern to offer a framework, called FAE, and proposes a hot-embedding aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reduces the data transfers from CPU to GPU. At the same time, FAE engages the GPU to accelerate the executions of these hot embedding entries. Experiments on production-scale recommendation models with real datasets show that FAE reduces the overall training time by 2.3x and 1.52x in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy △ Less

Submitted 28 September, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

ACM Class: I.2.6; C.5.0

Journal ref: Proceedings of the VLDB Endowment, 2022

arXiv:2006.08361 [pdf, other]

An Unsupervised Machine Learning Approach to Assess the ZIP Code Level Impact of COVID-19 in NYC

Authors: Fadoua Khmaissia, Pegah Sagheb Haghighi, Aarthe Jayaprakash, Zhenwei Wu, Sokratis Papadopoulos, Yuan Lai, Freddy T. Nguyen

Abstract: New York City has been recognized as the world's epicenter of the novel Coronavirus pandemic. To identify the key inherent factors that are highly correlated to the Increase Rate of COVID-19 new cases in NYC, we propose an unsupervised machine learning framework. Based on the assumption that ZIP code areas with similar demographic, socioeconomic, and mobility patterns are likely to experience simi… ▽ More New York City has been recognized as the world's epicenter of the novel Coronavirus pandemic. To identify the key inherent factors that are highly correlated to the Increase Rate of COVID-19 new cases in NYC, we propose an unsupervised machine learning framework. Based on the assumption that ZIP code areas with similar demographic, socioeconomic, and mobility patterns are likely to experience similar outbreaks, we select the most relevant features to perform a clustering that can best reflect the spread, and map them down to 9 interpretable categories. We believe that our findings can guide policy makers to promptly anticipate and prevent the spread of the virus by taking the right measures. △ Less

Submitted 18 September, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Presented at ICML 2020 Workshop on the Healthcare Systems, Population Health, and the Role of Health-Tech

arXiv:1909.00553 [pdf, ps, other]

doi 10.1145/3352460.3358281

Touché: Towards Ideal and Efficient Cache Compression By Mitigating Tag Area Overheads

Authors: Seokin Hong, Bulent Abali, Alper Buyuktosunoglu, Michael B. Healy, Prashant J. Nair

Abstract: Compression is seen as a simple technique to increase the effective cache capacity. Unfortunately, compression techniques either incur tag area overheads or restrict data placement to only include neighboring compressed cache blocks to mitigate tag area overheads. Ideally, we should be able to place arbitrary compressed cache blocks without any placement restrictions and tag area overheads. This… ▽ More Compression is seen as a simple technique to increase the effective cache capacity. Unfortunately, compression techniques either incur tag area overheads or restrict data placement to only include neighboring compressed cache blocks to mitigate tag area overheads. Ideally, we should be able to place arbitrary compressed cache blocks without any placement restrictions and tag area overheads. This paper proposes Touché, a framework that enables storing multiple arbitrary compressed cache blocks within a physical cacheline without any tag area overheads. The Touché framework consists of three components. The first component, called the ``Signature'' (SIGN) engine, creates shortened signatures from the tag addresses of compressed blocks. Due to this, the SIGN engine can store multiple signatures in each tag entry. On a cache access, the physical cacheline is accessed only if there is a signature match (which has a negligible probability of false positive). The second component, called the ``Tag Appended Data'' (TADA) mechanism, stores the full tag addresses with data. TADA enables Touché to detect false positive signature matches by ensuring that the actual tag address is available for comparison. The third component, called the ``Superblock Marker'' (SMARK) mechanism, uses a unique marker in the tag entry to indicate the occurrence of compressed cache blocks from neighboring physical addresses in the same cacheline. Touché is completely hardware-based and achieves an average speedup of 12\% (ideal 13\%) when compared to an uncompressed baseline. △ Less

Submitted 2 September, 2019; originally announced September 2019.

Comments: Keywords: Compression, Caches, Tag Array, Data Array, Hashing

Journal ref: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, October 2019, Pages 453-465

arXiv:1804.07347 [pdf, other]

doi 10.1117/1.JRS.14.036507

Randomized ICA and LDA Dimensionality Reduction Methods for Hyperspectral Image Classification

Authors: Chippy Jayaprakash, Bharath Bhushan Damodaran, Sowmya V, K P Soman

Abstract: Dimensionality reduction is an important step in processing the hyperspectral images (HSI) to overcome the curse of dimensionality problem. Linear dimensionality reduction methods such as Independent component analysis (ICA) and Linear discriminant analysis (LDA) are commonly employed to reduce the dimensionality of HSI. These methods fail to capture non-linear dependency in the HSI data, as data… ▽ More Dimensionality reduction is an important step in processing the hyperspectral images (HSI) to overcome the curse of dimensionality problem. Linear dimensionality reduction methods such as Independent component analysis (ICA) and Linear discriminant analysis (LDA) are commonly employed to reduce the dimensionality of HSI. These methods fail to capture non-linear dependency in the HSI data, as data lies in the nonlinear manifold. To handle this, nonlinear transformation techniques based on kernel methods were introduced for dimensionality reduction of HSI. However, the kernel methods involve cubic computational complexity while computing the kernel matrix, and thus its potential cannot be explored when the number of pixels (samples) are large. In literature a fewer number of pixels are randomly selected to partial to overcome this issue, however this sub-optimal strategy might neglect important information in the HSI. In this paper, we propose randomized solutions to the ICA and LDA dimensionality reduction methods using Random Fourier features, and we label them as RFFICA and RFFLDA. Our proposed method overcomes the scalability issue and to handle the non-linearities present in the data more efficiently. Experiments conducted with two real-world hyperspectral datasets demonstrates that our proposed randomized methods outperform the conventional kernel ICA and kernel LDA in terms overall, per-class accuracies and computational time. △ Less

Submitted 19 April, 2018; originally announced April 2018.

Comments: Submitted IEEE JSTARS

Journal ref: J. of Applied Remote Sensing, 14(3), 036507 (2020)

arXiv:1802.06970 [pdf, other]

ISA-Based Trusted Network Functions And Server Applications In The Untrusted Cloud

Authors: Spyridon Mastorakis, Tahrina Ahmed, Jayaprakash Pisharath

Abstract: Nowadays, enterprises widely deploy Network Functions (NFs) and server applications in the cloud. However, processing of sensitive data and trusted execution cannot be securely deployed in the untrusted cloud. Cloud providers themselves could accidentally leak private information (e.g., due to misconfigurations) or rogue users could exploit vulnerabilities of the providers' systems to compromise e… ▽ More Nowadays, enterprises widely deploy Network Functions (NFs) and server applications in the cloud. However, processing of sensitive data and trusted execution cannot be securely deployed in the untrusted cloud. Cloud providers themselves could accidentally leak private information (e.g., due to misconfigurations) or rogue users could exploit vulnerabilities of the providers' systems to compromise execution integrity, posing a threat to the confidentiality of internal enterprise and customer data. In this paper, we identify (i) a number of NF and server application use-cases that trusted execution can be applied to, (ii) the assets and impact of compromising the private data and execution integrity of each use-case, and (iii) we leverage Intel's Software Guard Extensions (SGX) architecture to design Trusted Execution Environments (TEEs) for cloud-based NFs and server applications. We combine SGX with the Data Plane Development KIT (DPDK) to prototype and evaluate our TEEs for a number of application scenarios (Layer 2 frame and Layer 3 packet processing for plain and encrypted traffic, traffic load-balancing and back-end server processing). Our results indicate that NFs involving plain traffic can achieve almost native performance (e.g., ~22 Million Packets Per Second for Layer 3 forwarding for 64-byte frames), while NFs involving encrypted traffic and server processing can still achieve competitive performance (e.g., ~12 Million Packets Per Second for server processing for 64-byte frames). △ Less

Submitted 20 February, 2018; originally announced February 2018.

arXiv:1704.03991 [pdf, ps, other]

Architectural Techniques to Enable Reliable and Scalable Memory Systems

Authors: Prashant J. Nair

Abstract: High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen… ▽ More High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen as the key impediment towards using high-density devices, adopting new technologies, and even building the next Exascale supercomputer. To ensure even a bare-minimum level of reliability, present-day solutions tend to have high performance, power and area overheads. Ideally, we would like memory systems to remain robust, scalable, and implementable while keeping the overheads to a minimum. This dissertation describes how simple cross-layer architectural techniques can provide orders of magnitude higher reliability and enable seamless scalability for memory systems while incurring negligible overheads. △ Less

Submitted 13 April, 2017; originally announced April 2017.

Comments: PhD thesis, Georgia Institute of Technology (May 2017)

arXiv:1603.06297 [pdf]

Notes on "An Effective ECC based User Access Control Scheme with Attribute based Encryption for WSN"

Authors: Mrudula S, ChandraMouli Reddy, Lakshmi Narayana, JayaPrakash, Chandra Sekhar Vorugunti

Abstract: The rapid growth of networking and communication technologies results in amalgamation of 'Internet of Things' and 'Wireless sensor networks' to form WSNIT. WSNIT facilitates the WSN to connect dynamically to Internet and exchange the data with the external world. The critical data stored in sensor nodes related to patient health, environment can be accessed by attackers via insecure internet. To c… ▽ More The rapid growth of networking and communication technologies results in amalgamation of 'Internet of Things' and 'Wireless sensor networks' to form WSNIT. WSNIT facilitates the WSN to connect dynamically to Internet and exchange the data with the external world. The critical data stored in sensor nodes related to patient health, environment can be accessed by attackers via insecure internet. To counterattack this, there is a demand for data integrity and controlled data access by incorporating a highly secure and light weight authentication schemes. In this context, Santanu et al had proposed an attribute based authentication framework for WSN and discussed on its security strengths. In this paper, we do a thorough analysis on Santanu et al scheme, to show that their scheme is susceptible to privileged insider attack and node capture attack. We also demonstrate that Santanu et al scheme consists of major inconsistencies which restrict the protocol execution. △ Less

Submitted 18 March, 2016; originally announced March 2016.

Comments: AIMOC 2016 Jadavpur university

arXiv:1310.3424 [pdf, ps, other]

doi 10.1109/JSTSP.2014.2361315

Optimal Energy Consumption Model for Smart Grid Households with Energy Storage

Authors: Jayaprakash Rajasekharan, Visa Koivunen

Abstract: In this paper, we propose to model the energy consumption of smart grid households with energy storage systems as an intertemporal trading economy. Intertemporal trade refers to transaction of goods across time when an agent, at any time, is faced with the option of consuming or saving with the aim of using the savings in the future or spending the savings from the past. Smart homes define optimal… ▽ More In this paper, we propose to model the energy consumption of smart grid households with energy storage systems as an intertemporal trading economy. Intertemporal trade refers to transaction of goods across time when an agent, at any time, is faced with the option of consuming or saving with the aim of using the savings in the future or spending the savings from the past. Smart homes define optimal consumption as either balancing/leveling consumption such that the utility company is presented with a uniform demand or as minimizing consumption costs by storing energy during off-peak time periods when prices are lower and use the stored energy during peak time periods when prices are higher. Due to the varying nature of energy requirements of household and market energy prices over different time periods in a day, households face a trade-off between consuming to meet their current energy requirements and/or storing energy for future consumption and/or spending energy stored in the past. These trade-offs or consumption preferences of the household are modeled as utility functions using consumer theory. We introduce two different utility functions, one for cost minimization and another for consumption balancing/leveling, that are maximized subject to respective budget, consumption, storage and savings constraints to solve for the optimum consumption profile. The optimization problem of a household with energy storage is formulated as a geometric program for consumption balancing/leveling, while cost minimization is formulated as a linear programming problem. Simulation results show that the proposed model achieves extremely low peak to average ratio in the consumption balancing/leveling scheme with about 8% reduction in consumption costs and the least possible amount for electricity bill with about 12% reduction in consumption costs in the cost minimization scheme. △ Less

Submitted 12 October, 2013; originally announced October 2013.

Comments: 26 pages, 9 figures, 34 equations

arXiv:1112.1520 [pdf, ps, other]

Cooperative Game-Theoretic Approach to Spectrum Sharing in Cognitive Radios

Authors: Jayaprakash Rajasekharan, Jan Eriksson, Visa Koivunen

Abstract: In this paper, a novel framework for normative modeling of the spectrum sensing and sharing problem in cognitive radios (CRs) as a transferable utility (TU) cooperative game is proposed. Secondary users (SUs) jointly sense the spectrum and cooperatively detect the primary user (PU) activity for identifying and accessing unoccupied spectrum bands. The games are designed to be balanced and super-add… ▽ More In this paper, a novel framework for normative modeling of the spectrum sensing and sharing problem in cognitive radios (CRs) as a transferable utility (TU) cooperative game is proposed. Secondary users (SUs) jointly sense the spectrum and cooperatively detect the primary user (PU) activity for identifying and accessing unoccupied spectrum bands. The games are designed to be balanced and super-additive so that resource allocation is possible and provides SUs with an incentive to cooperate and form the grand coalition. The characteristic function of the game is derived based on the worths of SUs, calculated according to the amount of work done for the coalition in terms of reduction in uncertainty about PU activity. According to her worth in the coalition, each SU gets a pay-off that is computed using various one-point solutions such as Shapley value, τ-value and Nucleolus. Depending upon their data rate requirements for transmission, SUs use the earned pay-off to bid for idle channels through a socially optimal Vickrey-Clarke-Groves (VCG) auction mechanism. Simulation results show that, in comparison with other resource allocation models, the proposed cooperative game-theoretic model provides the best balance between fairness, cooperation and performance in terms of data rates achieved by each SU. △ Less

Submitted 7 December, 2011; originally announced December 2011.

Comments: 11 pages, 9 figures, 6 tables, journal

Showing 1–36 of 36 results for author: JayaPrakash