Search | arXiv e-print repository

Widely Linear Augmented Extreme Learning Machine Based Impairments Compensation for Satellite Communications

Authors: Yang Luo, Arunprakash Jayaprakash, Gaojie Chen, Chong Huang, Qu Luo, Pei Xiao

Abstract: Satellite communications are crucial for the evolution beyond fifth-generation networks. However, the dynamic nature of satellite channels and their inherent impairments present significant challenges. In this paper, a novel post-compensation scheme that combines the complex-valued extreme learning machine with augmented hidden layer (CELMAH) architecture and widely linear processing (WLP) is deve… ▽ More Satellite communications are crucial for the evolution beyond fifth-generation networks. However, the dynamic nature of satellite channels and their inherent impairments present significant challenges. In this paper, a novel post-compensation scheme that combines the complex-valued extreme learning machine with augmented hidden layer (CELMAH) architecture and widely linear processing (WLP) is developed to address these issues by exploiting signal impropriety in satellite communications. Although CELMAH shares structural similarities with WLP, it employs a different core algorithm and does not fully exploit the signal impropriety. By incorporating WLP principles, we derive a tailored formulation suited to the network structure and propose the CELM augmented by widely linear least squares (CELM-WLLS) for post-distortion. The proposed approach offers enhanced communication robustness and is highly effective for satellite communication scenarios characterized by dynamic channel conditions and non-linear impairments. CELM-WLLS is designed to improve signal recovery performance and outperform traditional methods such as least square (LS) and minimum mean square error (MMSE). Compared to CELMAH, CELM-WLLS demonstrates approximately 0.8 dB gain in BER performance, and also achieves a two-thirds reduction in computational complexity, making it a more efficient solution. △ Less

Submitted 19 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

Comments: 12 pages, accepted for pulication in IEEE Transactions on Vehicular Technology

arXiv:2506.06999 [pdf, ps, other]

Towards Physics-informed Diffusion for Anomaly Detection in Trajectories

Authors: Arun Sharma, Mingzhou Yang, Majid Farhadloo, Subhankar Ghosh, Bharat Jayaprakash, Shashi Shekhar

Abstract: Given trajectory data, a domain-specific study area, and a user-defined threshold, we aim to find anomalous trajectories indicative of possible GPS spoofing (e.g., fake trajectory). The problem is societally important to curb illegal activities in international waters, such as unauthorized fishing and illicit oil transfers. The problem is challenging due to advances in AI generated in deep fakes g… ▽ More Given trajectory data, a domain-specific study area, and a user-defined threshold, we aim to find anomalous trajectories indicative of possible GPS spoofing (e.g., fake trajectory). The problem is societally important to curb illegal activities in international waters, such as unauthorized fishing and illicit oil transfers. The problem is challenging due to advances in AI generated in deep fakes generation (e.g., additive noise, fake trajectories) and lack of adequate amount of labeled samples for ground-truth verification. Recent literature shows promising results for anomalous trajectory detection using generative models despite data sparsity. However, they do not consider fine-scale spatiotemporal dependencies and prior physical knowledge, resulting in higher false-positive rates. To address these limitations, we propose a physics-informed diffusion model that integrates kinematic constraints to identify trajectories that do not adhere to physical laws. Experimental results on real-world datasets in the maritime and urban domains show that the proposed framework results in higher prediction accuracy and lower estimation error rate for anomaly detection and trajectory generation methods, respectively. Our implementation is available at https://github.com/arunshar/Physics-Informed-Diffusion-Probabilistic-Model. △ Less

Submitted 14 June, 2025; v1 submitted 8 June, 2025; originally announced June 2025.

arXiv:2506.06773 [pdf, ps, other]

Taming Wild Branches: Overcoming Hard-to-Predict Branches using the Bullseye Predictor

Authors: Emet Behrendt, Shing Wai Pun, Prashant J. Nair

Abstract: Branch prediction is key to the performance of out-of-order processors. While the CBP-2016 winner TAGE-SC-L combines geometric-history tables, a statistical corrector, and a loop predictor, over half of its remaining mispredictions stem from a small set of hard-to-predict (H2P) branches. These branches occur under diverse global histories, causing repeated thrashing in TAGE and eviction before use… ▽ More Branch prediction is key to the performance of out-of-order processors. While the CBP-2016 winner TAGE-SC-L combines geometric-history tables, a statistical corrector, and a loop predictor, over half of its remaining mispredictions stem from a small set of hard-to-predict (H2P) branches. These branches occur under diverse global histories, causing repeated thrashing in TAGE and eviction before usefulness counters can mature. Prior work shows that simply enlarging the tables offers only marginal improvement. We augment a 159 KB TAGE-SC-L predictor with a 28 KB H2P-targeted subsystem called the Bullseye predictor. It identifies problematic PCs using a set-associative H2P Identification Table (HIT) and steers them to one of two branch-specific perceptrons, one indexed by hashed local history and the other by folded global history. A short trial phase tracks head-to-head accuracy in an H2P cache. A branch becomes perceptron-resident only if the perceptron's sustained accuracy and output magnitude exceed dynamic thresholds, after which TAGE updates for that PC are suppressed to reduce pollution. The HIT, cache, and perceptron operate fully in parallel with TAGE-SC-L, providing higher fidelity on the H2P tail. This achieves an average MPKI of 3.4045 and CycWpPKI of 145.09. △ Less

Submitted 7 June, 2025; originally announced June 2025.

Comments: Paper accepted and presented at the 6th Championship Branch Prediction (CBP) workshop, co-held with ISCA 2025, on June 21, 2025, Tokyo, Japan

ACM Class: C.1.2; B.2.1; C.4; C.0

arXiv:2504.07048 [pdf, other]

Context Switching for Secure Multi-programming of Near-Term Quantum Computers

Authors: Avinash Kumar, Meng Wang, Chenxu Liu, Ang Li, Prashant J. Nair, Poulami Das

Abstract: Multi-programming quantum computers improve device utilization and throughput. However, crosstalk from concurrent two-qubit CNOT gates poses security risks, compromising the fidelity and output of co-running victim programs. We design Zero Knowledge Tampering Attacks (ZKTAs), using which attackers can exploit crosstalk without knowledge of the hardware error profile. ZKTAs can alter victim program… ▽ More Multi-programming quantum computers improve device utilization and throughput. However, crosstalk from concurrent two-qubit CNOT gates poses security risks, compromising the fidelity and output of co-running victim programs. We design Zero Knowledge Tampering Attacks (ZKTAs), using which attackers can exploit crosstalk without knowledge of the hardware error profile. ZKTAs can alter victim program outputs in 40% of cases on commercial systems. We identify that ZKTAs succeed because the attacker's program consistently runs with the same victim program in a fixed context. To mitigate this, we propose QONTEXTS: a context-switching technique that defends against ZKTAs by running programs across multiple contexts, each handling only a subset of trials. QONTEXTS uses multi-programming with frequent context switching while identifying a unique set of programs for each context. This helps limit only a fraction of execution to ZKTAs. We enhance QONTEXTS with attack detection capabilities that compare the distributions from different contexts against each other to identify noisy contexts executed with ZKTAs. Our evaluations on real IBMQ systems show that QONTEXTS increases program resilience by three orders of magnitude and fidelity by 1.33$\times$ on average. Moreover, QONTEXTS improves throughput by 2$\times$, advancing security in multi-programmed environments. △ Less

Submitted 17 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

arXiv:2503.05648 [pdf, other]

Physics-based machine learning framework for predicting NOx emissions from compression ignition engines using on-board diagnostics data

Authors: Harish Panneer Selvam, Bharat Jayaprakash, Yan Li, Shashi Shekhar, William F. Northrop

Abstract: This work presents a physics-based machine learning framework to predict and analyze oxides of nitrogen (NOx) emissions from compression-ignition engine-powered vehicles using on-board diagnostics (OBD) data as input. Accurate NOx prediction from OBD datasets is difficult because NOx formation inside an engine combustion chamber is governed by complex processes occurring on timescales much shorter… ▽ More This work presents a physics-based machine learning framework to predict and analyze oxides of nitrogen (NOx) emissions from compression-ignition engine-powered vehicles using on-board diagnostics (OBD) data as input. Accurate NOx prediction from OBD datasets is difficult because NOx formation inside an engine combustion chamber is governed by complex processes occurring on timescales much shorter than the data collection rate. Thus, emissions generally cannot be predicted accurately using simple empirically derived physics models. Black box models like genetic algorithms or neural networks can be more accurate, but have poor interpretability. The transparent model presented in this paper has both high accuracy and can explain potential sources of high emissions. The proposed framework consists of two major steps: a physics-based NOx prediction model combined with a novel Divergent Window Co-occurrence (DWC) Pattern detection algorithm to analyze operating conditions that are not adequately addressed by the physics-based model. The proposed framework is validated for generalizability with a second vehicle OBD dataset, a sensitivity analysis is performed, and model predictions are compared with that from a deep neural network. The results show that NOx emissions predictions using the proposed model has around 55% better root mean square error, and around 60% higher mean absolute error compared to the baseline NOx prediction model from previously published work. The DWC Pattern Detection Algorithm identified low engine power conditions to have high statistical significance, indicating an operating regime where the model can be improved. This work shows that the physics-based machine learning framework is a viable method for predicting NOx emissions from engines that do not incorporate NOx sensing. △ Less

Submitted 7 March, 2025; originally announced March 2025.

arXiv:2503.00979 [pdf, ps, other]

Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs

Authors: Ravi Ghadia, Avinash Kumar, Gaurav Jain, Prashant Nair, Poulami Das

Abstract: Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications -- such as chatbots and interactive assistants -- where low latency and high memory efficiency are critical. Existing… ▽ More Autoregressive Transformers rely on Key-Value (KV) caching to accelerate inference. However, the linear growth of the KV cache with context length leads to excessive memory consumption and bandwidth constraints. This bottleneck is particularly problematic in real-time applications -- such as chatbots and interactive assistants -- where low latency and high memory efficiency are critical. Existing methods drop distant tokens or compress states in a lossy manner, sacrificing accuracy by discarding vital context or introducing bias. We propose MorphKV, an inference-time technique that maintains a constant-sized KV cache while preserving accuracy. MorphKV balances long-range dependencies and local coherence during text generation. It eliminates early-token bias while retaining high-fidelity context by adaptively ranking tokens through correlation-aware selection. Unlike heuristic retention or lossy compression, MorphKV iteratively refines the KV cache via lightweight updates guided by attention patterns of recent tokens. This approach captures inter-token correlation with greater accuracy, crucial for tasks like content creation and code generation. Our studies on long-response tasks show 52.9$\%$ memory savings and 18.2$\%$ higher accuracy on average compared to state-of-the-art prior works, enabling efficient real-world deployment. △ Less

Submitted 7 June, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

Comments: Published in the Proceedings of the 42nd International Conference on Machine Learning (ICML), Vancouver, Canada

arXiv:2502.15013 [pdf, other]

Towards Physics-Guided Foundation Models

Authors: Majid Farhadloo, Arun Sharma, Mingzhou Yang, Bharat Jayaprakash, William Northrop, Shashi Shekhar

Abstract: Traditional foundation models are pre-trained on broad datasets to reduce the training resources (e.g., time, energy, labeled samples) needed for fine-tuning a wide range of downstream tasks. However, traditional foundation models struggle with out-of-distribution prediction and can produce outputs that are unrealistic and physically infeasible. We propose the notation of physics-guided foundation… ▽ More Traditional foundation models are pre-trained on broad datasets to reduce the training resources (e.g., time, energy, labeled samples) needed for fine-tuning a wide range of downstream tasks. However, traditional foundation models struggle with out-of-distribution prediction and can produce outputs that are unrealistic and physically infeasible. We propose the notation of physics-guided foundation models (PGFM), that is, foundation models integrated with broad or general domain (e.g., scientific) physical knowledge applicable to a wide range of downstream tasks. △ Less

Submitted 23 April, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

arXiv:2502.05065 [pdf, other]

Multicenter higher-derivative BPS black holes

Authors: Yide Cai, Sabarenath Jayaprakash, James T. Liu, Robert J. Saskowski

Abstract: We consider the reduction of four-derivative heterotic supergravity on a torus and construct two-charge multicenter BPS black hole solutions. In $d=5$, the three-form field can be dualized to a gauge field and we correspondingly construct three-charge multicenter BPS black hole solutions to the dualized Bergshoeff-de Roo action. This makes precise the embedding of known solutions into five-dimensi… ▽ More We consider the reduction of four-derivative heterotic supergravity on a torus and construct two-charge multicenter BPS black hole solutions. In $d=5$, the three-form field can be dualized to a gauge field and we correspondingly construct three-charge multicenter BPS black hole solutions to the dualized Bergshoeff-de Roo action. This makes precise the embedding of known solutions into five-dimensional $α'$-corrected STU supergravity. △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: 30 Pages

arXiv:2501.18861 [pdf, other]

doi 10.1109/HPCA61900.2025.00080

QPRAC: Towards Secure and Practical PRAC-based Rowhammer Mitigation using Priority Queues

Authors: Jeonghyun Woo, Chris S. Lin, Prashant J. Nair, Aamer Jaleel, Gururaj Saileshwar

Abstract: JEDEC has introduced the Per Row Activation Counting (PRAC) framework for DDR5 and future DRAMs to enable precise counting of DRAM row activations. PRAC enables a holistic mitigation of Rowhammer attacks even at ultra-low Rowhammer thresholds. PRAC uses an Alert Back-Off (ABO) protocol to request the memory controller to issue Rowhammer mitigation requests. However, recent PRAC implementations are… ▽ More JEDEC has introduced the Per Row Activation Counting (PRAC) framework for DDR5 and future DRAMs to enable precise counting of DRAM row activations. PRAC enables a holistic mitigation of Rowhammer attacks even at ultra-low Rowhammer thresholds. PRAC uses an Alert Back-Off (ABO) protocol to request the memory controller to issue Rowhammer mitigation requests. However, recent PRAC implementations are either insecure or impractical. For example, Panopticon, the inspiration for PRAC, is rendered insecure if implemented per JEDEC's PRAC specification. On the other hand, the recent UPRAC proposal is impractical since it needs oracular knowledge of the `top-N' activated DRAM rows that require mitigation. This paper provides the first secure, scalable, and practical RowHammer solution using the PRAC framework. The crux of our proposal is the design of a priority-based service queue (PSQ) for mitigations that prioritizes pending mitigations based on activation counts to avoid the security risks of prior solutions. This provides principled security using the reactive ABO protocol. Furthermore, we co-design our PSQ, with opportunistic mitigation on Refresh Management (RFM) operations and proactive mitigation during refresh (REF), to limit the performance impact of ABO-based mitigations. QPRAC provides secure and practical RowHammer mitigation that scales to Rowhammer thresholds as low as 71 while incurring a 0.8% slowdown for benign workloads, which further reduces to 0% with proactive mitigations. △ Less

Submitted 15 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

Comments: 15 pages, including appendices. The paper was presented at HPCA 2025 (https://hpca-conf.org/2025/)

Journal ref: 2025 IEEE Symposium on High-Performance Computer Architecture (HPCA 2025)

arXiv:2501.18857 [pdf, other]

doi 10.1109/HPCA61900.2025.00079

DAPPER: A Performance-Attack-Resilient Tracker for RowHammer Defense

Authors: Jeonghyun Woo, Prashant J. Nair

Abstract: RowHammer vulnerabilities pose a significant threat to modern DRAM-based systems, where rapid activation of DRAM rows can induce bit-flips in neighboring rows. To mitigate this, state-of-the-art host-side RowHammer mitigations typically rely on shared counters or tracking structures. While these optimizations benefit benign applications, they are vulnerable to Performance Attacks (Perf-Attacks), w… ▽ More RowHammer vulnerabilities pose a significant threat to modern DRAM-based systems, where rapid activation of DRAM rows can induce bit-flips in neighboring rows. To mitigate this, state-of-the-art host-side RowHammer mitigations typically rely on shared counters or tracking structures. While these optimizations benefit benign applications, they are vulnerable to Performance Attacks (Perf-Attacks), where adversaries exploit shared structures to reduce DRAM bandwidth for co-running benign applications by increasing DRAM accesses for RowHammer counters or triggering repetitive refreshes required for the early reset of structures, significantly degrading performance. In this paper, we propose secure hashing mechanisms to thwart adversarial attempts to capture the mapping of shared structures. We propose DAPPER, a novel low-cost tracker resilient to Perf-Attacks even at ultra-low RowHammer thresholds. We first present a secure hashing template in the form of DAPPER-S. We then develop DAPPER-H, an enhanced version of DAPPER-S, incorporating double-hashing, novel reset strategies, and mitigative refresh techniques. Our security analysis demonstrates the effectiveness of DAPPER-H against both RowHammer and Perf-Attacks. Experiments with 57 workloads from SPEC2006, SPEC2017, TPC, Hadoop, MediaBench, and YCSB show that, even at an ultra-low RowHammer threshold of 500, DAPPER-H incurs only a 0.9% slowdown in the presence of Perf-Attacks while using only 96KB of SRAM per 32GB of DRAM memory. △ Less

Submitted 15 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

Comments: The initial version of this paper was submitted to MICRO 2024 on April 18, 2024. The final version was presented at HPCA 2025 (https://hpca-conf.org/2025) and is 16 pages long, including references

Journal ref: 2025 IEEE Symposium on High-Performance Computer Architecture (HPCA 2025)

arXiv:2501.16762 [pdf, other]

Rate-Distortion under Neural Tracking of Speech: A Directed Redundancy Approach

Authors: Jan Østergaard, Sangeeth Geetha Jayaprakash, Rodrigo Ordoñez

Abstract: The data acquired at different scalp EEG electrodes when human subjects are exposed to speech stimuli are highly redundant. The redundancy is partly due to volume conduction effects and partly due to localized regions of the brain synchronizing their activity in response to the stimuli. In a competing talker scenario, we use a recent measure of directed redundancy to assess the amount of redundant… ▽ More The data acquired at different scalp EEG electrodes when human subjects are exposed to speech stimuli are highly redundant. The redundancy is partly due to volume conduction effects and partly due to localized regions of the brain synchronizing their activity in response to the stimuli. In a competing talker scenario, we use a recent measure of directed redundancy to assess the amount of redundant information that is causally conveyed from the attended stimuli to the left temporal region of the brain. We observe that for the attended stimuli, the transfer entropy as well as the directed redundancy is proportional to the correlation between the speech stimuli and the reconstructed signal from the EEG signals. This demonstrates that both the rate as well as the rate-redundancy are inversely proportional to the distortion in neural speech tracking. Thus, a greater rate indicates a greater redundancy between the electrode signals, and a greater correlation between the reconstructed signal and the attended stimuli. A similar relationship is not observed for the distracting stimuli. △ Less

Submitted 28 January, 2025; originally announced January 2025.

Comments: Accepted for IEEE Data Compression Conference

arXiv:2412.05649 [pdf, other]

RouteNet-Fermi: Network Modeling With GNN (Analysis And Re-implementation)

Authors: Shourya Verma, Simran Kadadi, Swathi Jayaprakash, Arpan Kumar Mahapatra, Ishaan Jain

Abstract: Network performance modeling presents important challenges in modern computer networks due to increasing complexity, scale, and diverse traffic patterns. While traditional approaches like queuing theory and packet-level simulation have served as foundational tools, they face limitations in modeling complex traffic behaviors and scaling to large networks. This project presents an extended implement… ▽ More Network performance modeling presents important challenges in modern computer networks due to increasing complexity, scale, and diverse traffic patterns. While traditional approaches like queuing theory and packet-level simulation have served as foundational tools, they face limitations in modeling complex traffic behaviors and scaling to large networks. This project presents an extended implementation of RouteNet-Fermi, a Graph Neural Network (GNN) architecture designed for network performance prediction, with additional recurrent neural network variants. We improve the the original architecture by implementing Long Short-Term Memory (LSTM) cells and Recurrent Neural Network (RNN) cells alongside the existing Gated Recurrent Unit (GRU) cells implementation. This work contributes to the understanding of recurrent neural architectures in GNN-based network modeling and provides a flexible framework for future experimentation with different cell types. △ Less

Submitted 7 December, 2024; originally announced December 2024.

arXiv:2412.03853 [pdf, other]

Automated LaTeX Code Generation from Handwritten Math Expressions Using Vision Transformer

Authors: Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado

Abstract: Transforming mathematical expressions into LaTeX poses a significant challenge. In this paper, we examine the application of advanced transformer-based architectures to address the task of converting handwritten or digital mathematical expression images into corresponding LaTeX code. As a baseline, we utilize the current state-of-the-art CNN encoder and LSTM decoder. Additionally, we explore enhan… ▽ More Transforming mathematical expressions into LaTeX poses a significant challenge. In this paper, we examine the application of advanced transformer-based architectures to address the task of converting handwritten or digital mathematical expression images into corresponding LaTeX code. As a baseline, we utilize the current state-of-the-art CNN encoder and LSTM decoder. Additionally, we explore enhancements to the CNN-RNN architecture by replacing the CNN encoder with the pretrained ResNet50 model with modification to suite the grey scale input. Further, we experiment with vision transformer model and compare with Baseline and CNN-LSTM model. Our findings reveal that the vision transformer architectures outperform the baseline CNN-RNN framework, delivering higher overall accuracy and BLEU scores while achieving lower Levenshtein distances. Moreover, these results highlight the potential for further improvement through fine-tuning of model parameters. To encourage open research, we also provide the model implementation, enabling reproduction of our results and facilitating further research in this domain. △ Less

Submitted 7 December, 2024; v1 submitted 4 December, 2024; originally announced December 2024.

Comments: 7 pages; 3 figures

arXiv:2409.12432 [pdf, other]

doi 10.1109/MICRO61859.2024.00060

Qoncord: A Multi-Device Job Scheduling Framework for Variational Quantum Algorithms

Authors: Meng Wang, Poulami Das, Prashant J. Nair

Abstract: Quantum computers face challenges due to limited resources, particularly in cloud environments. Despite these obstacles, Variational Quantum Algorithms (VQAs) are considered promising applications for present-day Noisy Intermediate-Scale Quantum (NISQ) systems. VQAs require multiple optimization iterations to converge on a globally optimal solution. Moreover, these optimizations, known as restarts… ▽ More Quantum computers face challenges due to limited resources, particularly in cloud environments. Despite these obstacles, Variational Quantum Algorithms (VQAs) are considered promising applications for present-day Noisy Intermediate-Scale Quantum (NISQ) systems. VQAs require multiple optimization iterations to converge on a globally optimal solution. Moreover, these optimizations, known as restarts, need to be repeated from different points to mitigate the impact of noise. Unfortunately, the job scheduling policies for each VQA task in the cloud are heavily unoptimized. Notably, each VQA execution instance is typically scheduled on a single NISQ device. Given the variety of devices in the cloud, users often prefer higher-fidelity devices to ensure higher-quality solutions. However, this preference leads to increased queueing delays and unbalanced resource utilization. We propose Qoncord, an automated job scheduling framework to address these cloud-centric challenges for VQAs. Qoncordleverages the insight that not all training iterations and restarts are equal, Qoncord strategically divides the training process into exploratory and fine-tuning phases. Early exploratory iterations, more resilient to noise, are executed on less busy machines, while fine-tuning occurs on high-fidelity machines. This adaptive approach mitigates the impact of noise and optimizes resource usage and queuing delays in cloud environments. Qoncord also significantly reduces execution time and minimizes restart overheads by eliminating low-performance iterations. Thus, Qoncord offers similar solutions 17.4x faster. Similarly, it can offer 13.3% better solutions for the same time budget as the baseline. △ Less

Submitted 26 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

Comments: This paper has been accepted at the 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)

Journal ref: 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)

arXiv:2408.00578 [pdf, other]

Propagation of Enzyme-driven Active Fluctuations in Crowded Milieu

Authors: Rik Chakraborty, Arnab Maiti, Diptangshu Paul, Rajnandan Borthakur, K. R. Jayaprakash, Uddipta Ghosh, Krishna Kanti Dey

Abstract: We investigated the energy transfer from active enzymes to their surroundings in crowded environments by measuring the diffusion of passive microscopic tracers in active solutions of ficoll and glycerol. Despite observing lower rates of substrate turnover and relatively smaller enhancement of passive tracer diffusion in artificial crowded media compared to those in aqueous solutions, we found a si… ▽ More We investigated the energy transfer from active enzymes to their surroundings in crowded environments by measuring the diffusion of passive microscopic tracers in active solutions of ficoll and glycerol. Despite observing lower rates of substrate turnover and relatively smaller enhancement of passive tracer diffusion in artificial crowded media compared to those in aqueous solutions, we found a significantly higher relative diffusion enhancement in crowded environments in the presence of enzymatic activity. Our experimental observations, coupled with supporting analytical estimations, underscored the critical role of the intervening media in facilitating mechanical energy distribution around active enzymes. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2408.00111 [pdf, other]

Femtosecond switching of strong light-matter interactions in microcavities with two-dimensional semiconductors

Authors: Armando Genco, Charalambos Louca, Cristina Cruciano, Kok Wee Song, Chiara Trovatello, Giuseppe Di Blasio, Giacomo Sansone, Sam Randerson, Peter Claronino, Rahul Jayaprakash, Kenji Watanabe, Takashi Taniguchi, David G. Lidzey, Oleksandr Kyriienko, Stefano Dal Conte, Alexander I. Tartakovskii, Giulio Cerullo

Abstract: Ultrafast all-optical logic devices based on nonlinear light-matter interactions hold the promise to overcome the speed limitations of conventional electronic devices. Strong coupling of excitons and photons inside an optical resonator enhances such interactions and generates new polariton states which give access to unique nonlinear phenomena, such as Bose-Einstein condensation, used for all-opti… ▽ More Ultrafast all-optical logic devices based on nonlinear light-matter interactions hold the promise to overcome the speed limitations of conventional electronic devices. Strong coupling of excitons and photons inside an optical resonator enhances such interactions and generates new polariton states which give access to unique nonlinear phenomena, such as Bose-Einstein condensation, used for all-optical ultrafast polariton transistors. However, the pulse energies required to pump such devices range from tens to hundreds of pJ, making them not competitive with electronic transistors. Here we introduce a new paradigm for all-optical switching based on the ultrafast transition from the strong to the weak coupling regime in microcavities embedding atomically thin transition metal dichalcogenides. Employing single and double stacks of hBN-encapsulated MoS$_2$ homobilayers with high optical nonlinearities and fast exciton relaxation times, we observe a collapse of the 55-meV polariton gap and its revival in less than one picosecond, lowering the threshold for optical switching below 4 pJ per pulse, while retaining ultrahigh switching frequencies. As an additional degree of freedom, the switching can be triggered pumping either the intra- or the interlayer excitons of the bilayers at different wavelengths, speeding up the polariton dynamics, owing to unique interspecies excitonic interactions. Our approach will enable the development of compact ultrafast all-optical logical circuits and neural networks, showcasing a new platform for polaritonic information processing based on manipulating the light-matter coupling. △ Less

Submitted 31 July, 2024; originally announced August 2024.

arXiv:2407.14490 [pdf, other]

doi 10.1145/3620665.3640363

Red-QAOA: Efficient Variational Optimization through Circuit Reduction

Authors: Meng Wang, Bo Fang, Ang Li, Prashant Nair

Abstract: The Quantum Approximate Optimization Algorithm (QAOA) addresses combinatorial optimization challenges by converting inputs to graphs. However, the optimal parameter searching process of QAOA is greatly affected by noise. Larger problems yield bigger graphs, requiring more qubits and making their outcomes highly noise-sensitive. This paper introduces Red-QAOA, leveraging energy landscape concentrat… ▽ More The Quantum Approximate Optimization Algorithm (QAOA) addresses combinatorial optimization challenges by converting inputs to graphs. However, the optimal parameter searching process of QAOA is greatly affected by noise. Larger problems yield bigger graphs, requiring more qubits and making their outcomes highly noise-sensitive. This paper introduces Red-QAOA, leveraging energy landscape concentration via a simulated annealing-based graph reduction. Red-QAOA creates a smaller (distilled) graph with nearly identical parameters to the original graph. The distilled graph produces a smaller quantum circuit and thus reduces noise impact. At the end of the optimization, Red-QAOA employs the parameters from the distilled graph on the original graph and continues the parameter search on the original graph. Red-QAOA outperforms state-of-the-art Graph Neural Network (GNN)-based pooling techniques on 3200 real-world problems. Red-QAOA reduced node and edge counts by 28% and 37%, respectively, with a mean square error of only 2%. △ Less

Submitted 21 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

ACM Class: C.m; G.m

Journal ref: In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 2024 Apr 27 (pp. 980-998)

arXiv:2406.14600 [pdf, other]

Higher derivative heterotic supergravity on a torus and supersymmetry

Authors: Sabarenath Jayaprakash, James T. Liu

Abstract: Ignoring ten-dimensional heterotic gauge fields, heterotic supergravity reduced on a $d$-dimensional torus gives rise to a half-maximal supergravity coupled to $d$ vector multiplets. The reduced theory has a continuous $O(d,d;\mathbb R)/O(d)_-\times O(d)_+$ symmetry that persists to all perturbative orders in the string $α'$ expansion. We highlight this symmetry by explicitly reducing the bosonic… ▽ More Ignoring ten-dimensional heterotic gauge fields, heterotic supergravity reduced on a $d$-dimensional torus gives rise to a half-maximal supergravity coupled to $d$ vector multiplets. The reduced theory has a continuous $O(d,d;\mathbb R)/O(d)_-\times O(d)_+$ symmetry that persists to all perturbative orders in the string $α'$ expansion. We highlight this symmetry by explicitly reducing the bosonic sector of four-derivative heterotic supergravity as well as its fermionic supersymmetry variations. After appropriate field redefinitions, the resulting action and supersymmetry variations are manifestly $O(d)_-\times O(d)_+$ invariant. This reduction allows us to explore the interplay between the gravity and vector multiplets beyond leading order, where (in our conventions) $O(d)_-$ is the supergravity R-symmetry while $O(d)_+$ is a flavor symmetry of the $d$ vector multiplets. △ Less

Submitted 17 September, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 26 pages, references added

Report number: LCTP-24-11

arXiv:2406.00013 [pdf]

Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval

Authors: Jayaprakash Sundararaj

Abstract: Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we prese… ▽ More Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we present our work on an recommending images for enhancing a substantial amount of existing plain text news articles. We use probabilistic models and word similarity heuristics to generate captions and extract Key-phrases which are re-ranked using a rank aggregation framework with relevance feedback mechanism. We show that such rank aggregation and relevant feedback which are typically used in Tagging Documents, Text Information Retrieval also helps in improving image retrieval. These queries are fed to the Yahoo Search Engine to obtain relevant images 1. Our proposed method is observed to perform better than all existing baselines. Additonally, We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. However, it is not easy to detect sentiment and simultaneously extract summary. The two tasks conflict in the sense that the demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. However, using submodularity we show how to strike a balance between the two requirements. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score. We also compare the performances of the proposed submodular functions. △ Less

Submitted 20 May, 2024; originally announced June 2024.

arXiv:2405.14579 [pdf, other]

Assessment of the Role and Origin of S* in Orange Carotenoid Protein Photoconversion

Authors: James P. Pidgeon, George A. Sutherland, Matthew S. Proctor, Shuangqing Wang, Dimitri Chekulaev, Sayantan Bhattacharya, Rahul Jayaprakash, Andrew Hitchcock, Ravi Kumar Venkatraman, Matthew P. Johnson, C. Neil Hunter, Jenny Clark

Abstract: The orange carotenoid protein (OCP) is the water-soluble mediator of non-photochemical quenching in cyanobacteria, a crucial photoprotective mechanism in response to excess illumination. OCP converts from a globular, inactive state (OCPo) to an extended, active conformation (OCPr) under high-light conditions, resulting in a concomitant redshift in the absorption of the bound carotenoid. Here, OCP… ▽ More The orange carotenoid protein (OCP) is the water-soluble mediator of non-photochemical quenching in cyanobacteria, a crucial photoprotective mechanism in response to excess illumination. OCP converts from a globular, inactive state (OCPo) to an extended, active conformation (OCPr) under high-light conditions, resulting in a concomitant redshift in the absorption of the bound carotenoid. Here, OCP was trapped in either the active or inactive state by fixing each protein conformation in trehalose-sucrose glass. Glass-encapsulated OCPo did not convert under intense illumination and OCPr did not convert in darkness, allowing the optical properties of each conformation to be determined at room temperature. We measured pump wavelength-dependent transient absorption of OCPo in glass films and found that initial OCP photoproducts are still formed, despite the glass preventing completion of the photocycle. By comparison to the pump wavelength dependence of the OCPo to OCPr photoconversion yield in buffer, we show that the long-lived carotenoid singlet-like feature (S*) is associated with ground-state heterogeneity within OCPo, rather than triggering OCP photoconversion. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.06579 [pdf, other]

Terahertz Antenna Impedance Matched to a Graphene Photodetector

Authors: François Joint, Kunyi Zhang, Jayaprakash Poojali, Daniel Lewis, Michael Pedowitz, Brendan Jordan, Gyan Prakash, Ashraf Ali, Kevin Daniels, Rachael L. Myers-Ward, Thomas E. Murphy, Howard D. Drew

Abstract: Developing low-power, high-sensitivity photodetectors for the terahertz (THz) band that operate at room temperature is an important challenge in optoelectronics. In this study, we introduce a photo-thermal-electric (PTE) effect detector based on quasi-free standing bilayer graphene (BLG) on a silicon carbide (SiC) substrate, designed for the THz frequency range. Our detector's performance hinges o… ▽ More Developing low-power, high-sensitivity photodetectors for the terahertz (THz) band that operate at room temperature is an important challenge in optoelectronics. In this study, we introduce a photo-thermal-electric (PTE) effect detector based on quasi-free standing bilayer graphene (BLG) on a silicon carbide (SiC) substrate, designed for the THz frequency range. Our detector's performance hinges on a quasi-optical coupling scheme, which integrates an aspherical silicon lens, to optimize impedance matching between the THz antenna and the graphene p-n junction. At room temperature, we achieved a noise equivalent power (NEP) of less than 300 $pW/\sqrt{Hz}$. Through an impedance matching analysis, we coupled a planar antenna with a graphene p-n junction, inserted in parallel to the nano-gap of the antenna, via two coupling capacitors. By adjusting the capacitors and the antenna arm length, we tailored the antenna's maximum infrared power absorption to specific frequencies. The sensitivity, spectral properties, and scalability of our material make it an ideal candidate for future development of far-infrared detectors operating at room temperature. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 21 pages, 4 figures

arXiv:2404.04270 [pdf, other]

Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings

Authors: Yassaman Ebrahimzadeh Maboud, Muhammad Adnan, Divya Mahajan, Prashant J. Nair

Abstract: Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variatio… ▽ More Training recommendation models pose significant challenges regarding resource utilization and performance. Prior research has proposed an approach that categorizes embeddings into popular and non-popular classes to reduce the training time for recommendation models. We observe that, even among the popular embeddings, certain embeddings undergo rapid training and exhibit minimal subsequent variation, resulting in saturation. Consequently, updates to these embeddings lack any contribution to model quality. This paper presents Slipstream, a software framework that identifies stale embeddings on the fly and skips their updates to enhance performance. This capability enables Slipstream to achieve substantial speedup, optimize CPU-GPU bandwidth usage, and eliminate unnecessary memory access. SlipStream showcases training time reductions of 2x, 2.4x, 1.2x, and 1.175x across real-world datasets and configurations, compared to Baseline XDL, Intel-optimized DRLM, FAE, and Hotline, respectively. △ Less

Submitted 21 March, 2024; originally announced April 2024.

arXiv:2403.10905 [pdf, other]

Computational Seismic Fracture Synthesis of Tidal Barrage using Enhanced Isotropic Plasticity Damage Mechanics and Coupled Lagrangian-Eulerian Multiphase Interaction

Authors: Sayan Chowdhury, Satya Kiran Raju Alluri, Jayaprakash J, Fang Yenn Teo, Umashankar M

Abstract: Mega-engineered hydraulic structures like dams and barrages are critically sensitive to strong ground motion if constructed within the vicinity of triggered fault lines. Collapse post excessive deformation leads to severe environmental impact. In this study, fracture corresponding to the response of a concrete tidal barrage to strong ground motion is analyzed along with behavioral effects due to r… ▽ More Mega-engineered hydraulic structures like dams and barrages are critically sensitive to strong ground motion if constructed within the vicinity of triggered fault lines. Collapse post excessive deformation leads to severe environmental impact. In this study, fracture corresponding to the response of a concrete tidal barrage to strong ground motion is analyzed along with behavioral effects due to reservoir-barrage dynamic interaction. An enhanced version of the plasticity damage mechanical model, which includes effects due to degradation of elastic stiffness of concrete as well as restoration of fracture energy losses is assigned as material behavior. The fluid-structure interaction is solved using an idealized Lagrangian-Eulerian formulation. The proposed improvised numerical formulations are validated against benchmark simulations performed on the Koyna dam situated in Maharashtra, India and the results captured are upto 94% accurate. Finite element simulation of a tidal barrage is performed using a computationally stable mesh with global grid to length ratio of 4.2. The yield surface captured is elliptical in nature and fracture is observed to be propagating from bottom of gate housing covering upto four nodal integration points. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 30 pages, 15 figures, 1 table, 1 algorithm

arXiv:2403.09054 [pdf, other]

Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference

Authors: Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant J. Nair, Ilya Soloveychik, Purushotham Kamath

Abstract: Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which constitutes the majority of the computational workload, primarily entails vector-matrix multiplications and interactions with the Key-Value (KV) Cache. This phas… ▽ More Transformers have emerged as the underpinning architecture for Large Language Models (LLMs). In generative language models, the inference process involves two primary phases: prompt processing and token generation. Token generation, which constitutes the majority of the computational workload, primarily entails vector-matrix multiplications and interactions with the Key-Value (KV) Cache. This phase is constrained by memory bandwidth due to the overhead of transferring weights and KV cache values from the memory system to the computing units. This memory bottleneck becomes particularly pronounced in applications that require long-context and extensive text generation, both of which are increasingly crucial for LLMs. This paper introduces "Keyformer", an innovative inference-time approach, to mitigate the challenges associated with KV cache size and memory bandwidth utilization. Keyformer leverages the observation that approximately 90% of the attention weight in generative inference focuses on a specific subset of tokens, referred to as "key" tokens. Keyformer retains only the key tokens in the KV cache by identifying these crucial tokens using a novel score function. This approach effectively reduces both the KV cache size and memory bandwidth usage without compromising model accuracy. We evaluate Keyformer's performance across three foundational models: GPT-J, Cerebras-GPT, and MPT, which employ various positional embedding algorithms. Our assessment encompasses a variety of tasks, with a particular emphasis on summarization and conversation tasks involving extended contexts. Keyformer's reduction of KV cache reduces inference latency by 2.1x and improves token generation throughput by 2.4x, while preserving the model's accuracy. △ Less

Submitted 5 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

MSC Class: 68U35 ACM Class: I.2.7; C.0

Journal ref: Proceedings of the 7th Annual Conference on Machine Learning and Systems (MLSys), 2024

arXiv:2402.00197 [pdf, other]

doi 10.1021/acs.est.3c06447

Determination of Trace Organic Contaminant Concentration via Machine Classification of Surface-Enhanced Raman Spectra

Authors: Vishnu Jayaprakash, Jae Bem You, Chiranjeevi Kanike, Jinfeng Liu, Christopher McCallum, Xuehua Zhang

Abstract: Accurate detection and analysis of traces of persistent organic pollutants in water is important in many areas, including environmental monitoring and food quality control, due to their long environmental stability and potential bioaccumulation. While conventional analysis of organic pollutants requires expensive equipment, surface enhanced Raman spectroscopy (SERS) has demonstrated great potentia… ▽ More Accurate detection and analysis of traces of persistent organic pollutants in water is important in many areas, including environmental monitoring and food quality control, due to their long environmental stability and potential bioaccumulation. While conventional analysis of organic pollutants requires expensive equipment, surface enhanced Raman spectroscopy (SERS) has demonstrated great potential for accurate detection of these contaminants. However, SERS analytical difficulties, such as spectral preprocessing, denoising, and substrate-based spectral variation, have hindered widespread use of the technique. Here, we demonstrate an approach for predicting the concentration of sample pollutants from messy, unprocessed Raman data using machine learning. Frequency domain transform methods, including the Fourier and Walsh Hadamard transforms, are applied to sets of Raman spectra of three model micropollutants in water (rhodamine 6G, chlorpyrifos, and triclosan), which are then used to train machine learning algorithms. Using standard machine learning models, the concentration of sample pollutants are predicted with more than 80 percent cross-validation accuracy from raw Raman data. cross-validation accuracy of 85 percent was achieved using deep learning for a moderately sized dataset (100 spectra), and 70 to 80 percent cross-validation accuracy was achieved even for very small datasets (50 spectra). Additionally, standard models were shown to accurately identify characteristic peaks via analysis of their importance scores. The approach shown here has the potential to be applied to facilitate accurate detection and analysis of persistent organic pollutants by surface-enhanced Raman spectroscopy. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2310.01301 [pdf, ps, other]

Short Time Angular Impulse Response of Rayleigh Beams

Authors: Bidhayak Goswami, K. R. Jayaprakash, Anindya Chatterjee

Abstract: In the dynamics of linear structures, the impulse response function is of fundamental interest. In some cases one examines the short term response wherein the disturbance is still local and the boundaries have not yet come into play, and for such short-time analysis the geometrical extent of the structure may be taken as unbounded. Here we examine the response of slender beams to angular impulses.… ▽ More In the dynamics of linear structures, the impulse response function is of fundamental interest. In some cases one examines the short term response wherein the disturbance is still local and the boundaries have not yet come into play, and for such short-time analysis the geometrical extent of the structure may be taken as unbounded. Here we examine the response of slender beams to angular impulses. The Euler-Bernoulli model, which does not include rotary inertia of cross sections, predicts an unphysical and unbounded initial rotation at the point of application. A finite length Euler-Bernoulli beam, when modelled using finite elements, predicts a mesh-dependent response that shows fast large-amplitude oscillations setting in very quickly. The simplest introduction of rotary inertia yields the Rayleigh beam model, which has more reasonable behaviour including a finite wave speed at all frequencies. If a Rayleigh beam is given an impulsive moment at a location away from its boundaries, then the predicted behaviour has an instantaneous finite jump in local slope or rotation, followed by smooth evolution of the slope for a finite time interval until reflections arrive from the boundary, causing subsequent slope discontinuities in time. We present a detailed study of the angular impulse response of a simply supported Rayleigh beam, starting with dimensional analysis, followed by modal expansion including all natural frequencies, culminating with an asymptotic formula for the short-time response. The asymptotic formula is obtained by breaking the series solution into two parts to be treated independently term by term, and leads to a polynomial in time. The polynomial matches the response from refined finite element (FE) simulations. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2308.14902 [pdf, other]

Ad-Rec: Advanced Feature Interactions to Address Covariate-Shifts in Recommendation Networks

Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

Abstract: Recommendation models are vital in delivering personalized user experiences by leveraging the correlation between multiple input features. However, deep learning-based recommendation models often face challenges due to evolving user behaviour and item features, leading to covariate shifts. Effective cross-feature learning is crucial to handle data distribution drift and adapting to changing user b… ▽ More Recommendation models are vital in delivering personalized user experiences by leveraging the correlation between multiple input features. However, deep learning-based recommendation models often face challenges due to evolving user behaviour and item features, leading to covariate shifts. Effective cross-feature learning is crucial to handle data distribution drift and adapting to changing user behaviour. Traditional feature interaction techniques have limitations in achieving optimal performance in this context. This work introduces Ad-Rec, an advanced network that leverages feature interaction techniques to address covariate shifts. This helps eliminate irrelevant interactions in recommendation tasks. Ad-Rec leverages masked transformers to enable the learning of higher-order cross-features while mitigating the impact of data distribution drift. Our approach improves model quality, accelerates convergence, and reduces training time, as measured by the Area Under Curve (AUC) metric. We demonstrate the scalability of Ad-Rec and its ability to achieve superior model quality through comprehensive ablation studies. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2307.02623 [pdf, other]

FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout

Authors: Irene Wang, Prashant J. Nair, Divya Mahajan

Abstract: Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall traini… ▽ More Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach. △ Less

Submitted 26 September, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

arXiv:2304.13807 [pdf, other]

A Survey on Solving and Discovering Differential Equations Using Deep Neural Networks

Authors: Hyeonjung, Jung, Jayant Gupta, Bharat Jayaprakash, Matthew Eagon, Harish Panneer Selvam, Carl Molnar, William Northrop, Shashi Shekhar

Abstract: Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and tran… ▽ More Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation. △ Less

Submitted 19 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Under review for ACM Computing Surveys journal. 29 pages

arXiv:2304.08766 [pdf]

Cashew dataset generation using augmentation and RaLSGAN and a transfer learning based tinyML approach towards disease detection

Authors: Varsha Jayaprakash, Akilesh K, Ajay kumar, Balamurugan M. S, Manoj Kumar Rajagopal

Abstract: Cashew is one of the most extensively consumed nuts in the world, and it is also known as a cash crop. A tree may generate a substantial yield in a few months and has a lifetime of around 70 to 80 years. Yet, in addition to the benefits, there are certain constraints to its cultivation. With the exception of parasites and algae, anthracnose is the most common disease affecting trees. When it comes… ▽ More Cashew is one of the most extensively consumed nuts in the world, and it is also known as a cash crop. A tree may generate a substantial yield in a few months and has a lifetime of around 70 to 80 years. Yet, in addition to the benefits, there are certain constraints to its cultivation. With the exception of parasites and algae, anthracnose is the most common disease affecting trees. When it comes to cashew, the dense structure of the tree makes it difficult to diagnose the disease with ease compared to short crops. Hence, we present a dataset that exclusively consists of healthy and diseased cashew leaves and fruits. The dataset is authenticated by adding RGB color transformation to highlight diseased regions, photometric and geometric augmentations, and RaLSGAN to enlarge the initial collection of images and boost performance in real-time situations when working with a constrained dataset. Further, transfer learning is used to test the classification efficiency of the dataset using algorithms such as MobileNet and Inception. TensorFlow lite is utilized to develop these algorithms for disease diagnosis utilizing drones in real-time. Several post-training optimization strategies are utilized, and their memory size is compared. They have proven their effectiveness by delivering high accuracy (up to 99%) and a decrease in memory and latency, making them ideal for use in applications with limited resources. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2302.04085 [pdf]

Ultrafast optical control of polariton energy in an organic semiconductor microcavity

Authors: Kirsty E. McGhee, Michele Guizzardi, Rahul Jayaprakash, Kyriacos Georgiou, Till Jessewitsch, Ullrich Scherf, Giulio Cerullo, Anton Zasedatelev, Tersilla Virgili, Pavlos G. Lagoudakis, David G. Lidzey

Abstract: The manipulation of exciton-polaritons and their condensates is of great interest due to their applications in polariton simulators and high-speed, all-optical logic devices. Until now, methods of trapping and manipulating such condensates are not dynamically reconfigurable or result in an undesirable reduction in the exciton-photon coupling strength. Here, we present a new strategy for the ultraf… ▽ More The manipulation of exciton-polaritons and their condensates is of great interest due to their applications in polariton simulators and high-speed, all-optical logic devices. Until now, methods of trapping and manipulating such condensates are not dynamically reconfigurable or result in an undesirable reduction in the exciton-photon coupling strength. Here, we present a new strategy for the ultrafast control of polariton resonances via transient modification of an optical cavity mode. We have constructed multilayer organic semiconductor microcavities that contain two absorbers: one strongly- and one weakly-coupled to the cavity photon mode. By selectively exciting the weakly-coupled absorber with ultrashort laser pulses, we modulate the cavity refractive index and generate fully-reversible blueshifts of the lower polariton branch by up to 8 meV in sub-ps timescales with no corresponding reduction in the exciton-photon coupling strength. Our work demonstrates the ability to manipulate polariton energy landscapes over ultrafast timescales with important applications in emerging computing technologies. △ Less

Submitted 4 May, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

arXiv:2301.09023 [pdf, other]

Study of Unsteadiness due to 3-D Shock-Boundary Layer Interaction in Flow over a Square-faced Protuberance

Authors: Ramachandra K, Sourabh Bhardwaj, Jayaprakash N. Murugan, Sriram R

Abstract: The dynamics of shock-induced unsteady separated flow past a three-dimensional square-faced protuberance is investigated through wind tunnel experiments. Time-resolved schlieren imaging and unsteady surface pressure measurements are the diagnostics employed. Dynamic Mode Decomposition (DMD) of schlieren snapshots, and analysis of spectrum and correlations in pressure data are used to characterize… ▽ More The dynamics of shock-induced unsteady separated flow past a three-dimensional square-faced protuberance is investigated through wind tunnel experiments. Time-resolved schlieren imaging and unsteady surface pressure measurements are the diagnostics employed. Dynamic Mode Decomposition (DMD) of schlieren snapshots, and analysis of spectrum and correlations in pressure data are used to characterize and resolve the flow physics. The mean shock foot in the centreline is found to exhibit a Strouhal number of around 0.01, which is also the order of magnitude of the Strouhal numbers reported in the literature for two-dimensional shock-boundary layer interactions. The wall pressure spectra, in general, shift towards lower frequencies as we move away from (spanwise) centreline with some variation in the nature of peaks. The cross-correlation analysis depicts the strong dependence of the mean shock oscillations to the plateau region, and disturbances are found to travel upstream from inside the separation bubble. Good coherence is observed between the spanwise mean shock foot locations till a strouhal number of about 0.015 indicating that the 3-D shock foot largely moves to-and-fro in a coherent fashion. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: 16 pages, 21 figures

arXiv:2212.12613 [pdf, other]

doi 10.1109/HPCA56546.2023.10070999

Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory Systems

Authors: Jeonghyun Woo, Gururaj Saileshwar, Prashant J. Nair

Abstract: As Dynamic Random Access Memories (DRAM) scale, they are becoming increasingly susceptible to Row Hammer. By rapidly activating rows of DRAM cells (aggressor rows), attackers can exploit inter-cell interference through Row Hammer to flip bits in neighboring rows (victim rows). A recent work, called Randomized Row-Swap (RRS), proposed proactively swapping aggressor rows with randomly selected rows… ▽ More As Dynamic Random Access Memories (DRAM) scale, they are becoming increasingly susceptible to Row Hammer. By rapidly activating rows of DRAM cells (aggressor rows), attackers can exploit inter-cell interference through Row Hammer to flip bits in neighboring rows (victim rows). A recent work, called Randomized Row-Swap (RRS), proposed proactively swapping aggressor rows with randomly selected rows before an aggressor row can cause Row Hammer. Our paper observes that RRS is neither secure nor scalable. We first propose the `Juggernaut attack pattern' that breaks RRS in under 1 day. Juggernaut exploits the fact that the mitigative action of RRS, a swap operation, can itself induce additional target row activations, defeating such a defense. Second, this paper proposes a new defense Secure Row-Swap mechanism that avoids the additional activations from swap (and unswap) operations and protects against Juggernaut. Furthermore, this paper extends Secure Row-Swap with attack detection to defend against even future attacks. While this provides better security, it also allows for securely reducing the frequency of swaps, thereby enabling Scalable and Secure Row-Swap. The Scalable and Secure Row-Swap mechanism provides years of Row Hammer protection with 3.3X lower storage overheads as compared to the RRS design. It incurs only a 0.7% slowdown as compared to a not-secure baseline for a Row Hammer threshold of 1200. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Journal ref: The 29th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022)

arXiv:2207.10793 [pdf, other]

doi 10.1145/3630614.3630616

The Dirty Secret of SSDs: Embodied Carbon

Authors: Swamit Tannu, Prashant J. Nair

Abstract: Scalable Solid-State Drives (SSDs) have ushered in a transformative era in data storage and accessibility, spanning both data centers and portable devices. However, the strides made in scaling this technology can bear significant environmental consequences. On a global scale, a notable portion of semiconductor manufacturing relies on electricity derived from coal and natural gas sources. A strikin… ▽ More Scalable Solid-State Drives (SSDs) have ushered in a transformative era in data storage and accessibility, spanning both data centers and portable devices. However, the strides made in scaling this technology can bear significant environmental consequences. On a global scale, a notable portion of semiconductor manufacturing relies on electricity derived from coal and natural gas sources. A striking example of this is the manufacturing process for a single Gigabyte of Flash memory, which emits approximately 0.16 Kg of CO2 - a considerable fraction of the total carbon emissions attributed to the system. Remarkably, the manufacturing of storage devices alone contributed to an estimated 20 million metric tonnes of CO2 emissions in the year 2021. In light of these environmental concerns, this paper delves into an analysis of the sustainability trade-offs inherent in Solid-State Drives (SSDs) when compared to traditional Hard Disk Drives (HDDs). Moreover, this study proposes methodologies to gauge the embodied carbon costs associated with storage systems effectively. The research encompasses four key strategies to enhance the sustainability of storage systems. In summation, this paper critically addresses the embodied carbon issues associated with SSDs, comparing them with HDDs, and proposes a comprehensive framework of strategies to enhance the sustainability of storage systems. △ Less

Submitted 28 September, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Journal ref: Energy Informatics Review (Volume 3 Issue 3, October 2023)

arXiv:2204.05436 [pdf, other]

Heterogeneous Acceleration Pipeline for Recommendation System Training

Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

Abstract: Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer tim… ▽ More Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer time. In contrast, the GPU-only mode utilizes High Bandwidth Memory (HBM) across multiple GPUs for storing embedding tables. However, this approach is expensive and presents scaling concerns. This paper introduces Hotline, a heterogeneous acceleration pipeline that addresses these concerns. Hotline develops a data-aware and model-aware scheduling pipeline by leveraging the insight that only a few embedding entries are frequently accessed (popular). This approach utilizes CPU main memory for non-popular embeddings and GPUs' HBM for popular embeddings. To achieve this, Hotline accelerator fragments a mini-batch into popular and non-popular micro-batches. It gathers the necessary working parameters for non-popular micro-batches from the CPU, while GPUs execute popular micro-batches. The hardware accelerator dynamically coordinates the execution of popular embeddings on GPUs and non-popular embeddings from the CPU's main memory. Real-world datasets and models confirm Hotline's effectiveness, reducing average end-to-end training time by 2.2x compared to Intel-optimized CPU-GPU DLRM baseline. △ Less

Submitted 28 April, 2024; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted at The International Symposium on Computer Architecture (ISCA), 2024

arXiv:2204.00485 [pdf, other]

doi 10.1038/s41467-023-39358-9

Nonlinear interactions of dipolar excitons and polaritons in MoS2 bilayers

Authors: Charalambos Louca, Armando Genco, Salvatore Chiavazzo, Thomas P. Lyons, Sam Randerson, Chiara Trovatello, Peter Claronino, Rahul Jayaprakash, Kenji Watanabe, Takashi Taniguchi, Stefano Dal Conte, David G. Lidzey, Giulio Cerullo, Oleksandr Kyriienko, Alexander I. Tartakovskii

Abstract: Nonlinear interactions between excitons strongly coupled to light are key for accessing quantum many-body phenomena in polariton systems. Atomically-thin two-dimensional semiconductors provide an attractive platform for strong light-matter coupling owing to many controllable excitonic degrees of freedom. Among these, the recently emerged exciton hybridization opens access to unexplored excitonic s… ▽ More Nonlinear interactions between excitons strongly coupled to light are key for accessing quantum many-body phenomena in polariton systems. Atomically-thin two-dimensional semiconductors provide an attractive platform for strong light-matter coupling owing to many controllable excitonic degrees of freedom. Among these, the recently emerged exciton hybridization opens access to unexplored excitonic species, with a promise of enhanced interactions. Here, we employ hybridized interlayer excitons (hIX) in bilayer MoS2 to achieve highly nonlinear excitonic and polaritonic effects. Such interlayer excitons possess an out-of-plane electric dipole as well as an unusually large oscillator strength allowing observation of dipolar polaritons(dipolaritons) in bilayers in optical microcavities. Compared to excitons and polaritons in MoS2 monolayers, both hIX and dipolaritons exhibit about 8 times higher nonlinearity, which is further strongly enhanced when hIX and intralayer excitons, sharing the same valence band, are excited simultaneously. This gives rise to a highly nonlinear regime which we describe theoretically by introducing a concept of hole crowding. The presented insight into many-body interactions provides new tools for accessing few-polariton quantum correlations. △ Less

Submitted 1 April, 2022; originally announced April 2022.

arXiv:2203.13892 [pdf, other]

doi 10.1145/3695053.3730992

Accelerating Simulation of Quantum Circuits under Noise via Computational Reuse

Authors: Meng Wang, Swamit Tannu, Prashant J. Nair

Abstract: To realize the full potential of quantum computers, we must mitigate qubit errors by developing noise-aware algorithms, compilers, and architectures. Thus, simulating quantum programs on high-performance computing (HPC) systems with different noise models is a de facto tool researchers use. Unfortunately, noisy simulators iteratively execute a similar circuit for thousands of trials, thereby incur… ▽ More To realize the full potential of quantum computers, we must mitigate qubit errors by developing noise-aware algorithms, compilers, and architectures. Thus, simulating quantum programs on high-performance computing (HPC) systems with different noise models is a de facto tool researchers use. Unfortunately, noisy simulators iteratively execute a similar circuit for thousands of trials, thereby incurring significant performance overheads. To address this, we propose a noisy simulation technique called Tree-Based Quantum Circuit Simulation (TQSim). TQSim exploits the reusability of intermediate results during the noisy simulation, reducing computation. TQSim dynamically partitions a circuit into several subcircuits. It then reuses the intermediate results from these subcircuits during computation. Compared to a noisy Qulacs-based baseline simulator, TQSim achieves a speedup of up to 3.89x for noisy simulations. TQSim is designed to be efficient with multi-node setups while also maintaining tight fidelity bounds. △ Less

Submitted 19 May, 2025; v1 submitted 25 March, 2022; originally announced March 2022.

Comments: Accepted for publication in the Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA '25). Manuscript length: 15 pages

arXiv:2112.02141 [pdf]

doi 10.1002/advs.202105569

Tuning the coherent propagation of organic exciton-polaritons through dark state delocalization

Authors: Raj Pandya, Arjun Ashoka, Kyriacos Georgiou, Jooyoung Sung, Rahul Jayaprakash, Scott Renken, Lizhi Gai, Zhen Shen, Akshay Rao, Andrew Musser

Abstract: While there have been numerous reports of long-range polariton transport at room-temperature in organic cavities, the spatio-temporal evolution of the propagation is scarcely reported, particularly in the initial coherent sub-ps regime, where photon and exciton wavefunctions are inextricably mixed. Hence the detailed process of coherent organic exciton-polariton transport and in particular the rol… ▽ More While there have been numerous reports of long-range polariton transport at room-temperature in organic cavities, the spatio-temporal evolution of the propagation is scarcely reported, particularly in the initial coherent sub-ps regime, where photon and exciton wavefunctions are inextricably mixed. Hence the detailed process of coherent organic exciton-polariton transport and in particular the role of dark states has remained poorly understood. Here, we use femtosecond transient absorption microscopy to directly image coherent polariton motion in microcavities of varying quality factor. We find the transport to be well-described by a model of band-like propagation of an initially Gaussian distribution of exciton-polaritons in real space. The velocity of the polaritons reaches values of ~0.65x10^6 m s-1, substantially lower than expected from the polariton dispersion. Further, we find that the velocity is proportional to the quality factor of the microcavity. We suggest this unexpected link between the quality-factor and polariton velocity and slow coherent transport to be a result of varying admixing between delocalised dark and polariton states. △ Less

Submitted 3 December, 2021; originally announced December 2021.

arXiv:2111.01354 [pdf, other]

SmartKC: Smartphone-based Corneal Topographer for Keratoconus Detection

Authors: Siddhartha Gairola, Murtuza Bohra, Nadeem Shaheer, Navya Jayaprakash, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it… ▽ More Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it a leading cause for partial/complete blindness among such populations. We propose SmartKC, a low-cost, smartphone-based keratoconus diagnosis system comprising of a 3D-printed placido's disc attachment, an LED light strip, and an intelligent smartphone app to capture the reflection of the placido rings on the cornea. An image processing pipeline analyzes the corneal image and uses the smartphone's camera parameters, the placido rings' 3D location, the pixel location of the reflected placido rings and the setup's working distance to construct the corneal surface, via the Arc-Step method and Zernike polynomials based surface fitting. In a clinical study with 101 distinct eyes, we found that SmartKC achieves a sensitivity of 94.1% and a specificity of 100.0%. Moreover, the quantitative curvature estimates (sim-K) strongly correlate with a gold-standard medical device (Pearson correlation coefficient =0.78). Our results indicate that SmartKC has the potential to be used as a keratoconus screening tool under real-world medical settings. △ Less

Submitted 21 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Change Log: + Fixed sim-K computation (updated Section 5.5.3); re-ran our pipeline with the updated sim-K values (updated Figure 7); + Conducted the comparative evaluation with doctors again (total 4 doctors), and got improved results (updated Section 7.2 and Table 2); [Note: This is an updated version of the paper that was accepted for publication in IMWUT 2021.]

arXiv:2107.05708 [pdf]

doi 10.1063/5.0063173

Untargeted Effects in Organic Exciton-Polariton Transient Spectroscopy: A Cautionary Tale

Authors: Scott Renken, Raj Pandya, Kyriacos Georgiou, Rahul Jayaprakash, Lizhi Gai, Zhen Shen, David G. Lidzey, Akshay Rao, Andrew J Musser

Abstract: Strong light-matter coupling to form exciton- and vibropolaritons is increasingly touted as a powerful tool to alter the fundamental properties of organic materials. It is proposed that these states and their facile tunability can be used to rewrite molecular potential energy landscapes and redirect photophysical pathways, with applications from catalysis to electronic devices. Crucial to their ph… ▽ More Strong light-matter coupling to form exciton- and vibropolaritons is increasingly touted as a powerful tool to alter the fundamental properties of organic materials. It is proposed that these states and their facile tunability can be used to rewrite molecular potential energy landscapes and redirect photophysical pathways, with applications from catalysis to electronic devices. Crucial to their photophysical properties is the exchange of energy between coherent, bright polaritons and incoherent dark states. One of the most potent tools to explore this interplay is transient absorption/reflectance spectroscopy. Previous studies have revealed unexpectedly long lifetimes of the coherent polariton states, for which there is no theoretical explanation. Applying these transient methods to a series of strong-coupled organic microcavities, we recover similar long-lived spectral effects. Based on transfer-matrix modelling of the transient experiment, we find that virtually the entire photoresponse results from photoexcitation effects other than the generation of polariton states. Our results suggest that the complex optical properties of polaritonic systems make them especially prone to misleading optical signatures, and that more challenging high-time-resolution measurements on high-quality microcavities are necessary to uniquely distinguish the coherent polariton dynamics. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Journal ref: J. Chem. Phys. 155, 154701 (2021)

arXiv:2106.15034 [pdf, other]

Approximation Schemes for Capacitated Vehicle Routing on Graphs of Bounded Treewidth, Bounded Doubling, or Highway Dimension

Authors: Aditya Jayaprakash, Mohammad R. Salavatipour

Abstract: In this paper, we present Approximation Schemes for Capacitated Vehicle Routing Problem (CVRP) on several classes of graphs. In CVRP, introduced by Dantzig and Ramser (1959), we are given a graph $G=(V,E)$ with metric edges costs, a depot $r\in V$, and a vehicle of bounded capacity $Q$. The goal is to find minimum cost collection of tours for the vehicle that returns to the depot, each visiting at… ▽ More In this paper, we present Approximation Schemes for Capacitated Vehicle Routing Problem (CVRP) on several classes of graphs. In CVRP, introduced by Dantzig and Ramser (1959), we are given a graph $G=(V,E)$ with metric edges costs, a depot $r\in V$, and a vehicle of bounded capacity $Q$. The goal is to find minimum cost collection of tours for the vehicle that returns to the depot, each visiting at most $Q$ nodes, such that they cover all the nodes. This generalizes classic TSP and has been studied extensively. In the more general setting, each node $v$ has a demand $d_v$ and the total demand of each tour must be no more than $Q$. Either the demand of each node must be served by one tour (unsplittable) or can be served by multiple tour (splittable). The best known approximation algorithm for general graphs has ratio $α+2(1-ε)$ (for the unsplittable) and $α+1-ε$ (for the splittable) for some fixed $ε>\frac{1}{3000}$, where $α$ is the best approximation for TSP. Even for the case of trees, the best approximation ratio is $4/3$ by Becker (2018) and it has been an open question if there is an approximation scheme for this simple class of graphs. Das and Mathieu (2015) presented an approximation scheme with time $n^{\log^{O(1/ε)}n}$ for Euclidean plane $\mathbb{R}^2$. No other approximation scheme is known for any other class of metrics (without further restrictions on $Q$). In this paper, we make significant progress on this classic problem by presenting Quasi-Polynomial Time Approximation Schemes (QPTAS) for graphs of bounded treewidth, graphs of bounded highway dimensions, and graphs of bounded doubling dimensions. For comparison, our result implies an approximation scheme for Euclidean plane with run time $n^{O(\log^{10}n/ε^{9})}$. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2104.04598 [pdf, other]

Cross-Modal learning for Audio-Visual Video Parsing

Authors: Jatin Lamba, Abhishek, Jayaprakash Akula, Rishabh Dabral, Preethi Jyothi, Ganesh Ramakrishnan

Abstract: In this paper, we present a novel approach to the audio-visual video parsing (AVVP) task that demarcates events from a video separately for audio and visual modalities. The proposed parsing approach simultaneously detects the temporal boundaries in terms of start and end times of such events. We show how AVVP can benefit from the following techniques geared towards effective cross-modal learning:… ▽ More In this paper, we present a novel approach to the audio-visual video parsing (AVVP) task that demarcates events from a video separately for audio and visual modalities. The proposed parsing approach simultaneously detects the temporal boundaries in terms of start and end times of such events. We show how AVVP can benefit from the following techniques geared towards effective cross-modal learning: (i) adversarial training and skip connections (ii) global context aware attention and, (iii) self-supervised pretraining using an audio-video grounding objective to obtain cross-modal audio-video representations. We present extensive experimental evaluations on the Look, Listen, and Parse (LLP) dataset and show that we outperform the state-of-the-art Hybrid Attention Network (HAN) on all five metrics proposed for AVVP. We also present several ablations to validate the effect of pretraining, global attention and adversarial training. △ Less

Submitted 21 June, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: Work accepted at Interspeech 2021

arXiv:2103.05457 [pdf, other]

Rudder: A Cross Lingual Video and Text Retrieval Dataset

Authors: Jayaprakash A, Abhishek, Rishabh Dabral, Ganesh Ramakrishnan, Preethi Jyothi

Abstract: Video retrieval using natural language queries requires learning semantically meaningful joint embeddings between the text and the audio-visual input. Often, such joint embeddings are learnt using pairwise (or triplet) contrastive loss objectives which cannot give enough attention to 'difficult-to-retrieve' samples during training. This problem is especially pronounced in data-scarce settings wher… ▽ More Video retrieval using natural language queries requires learning semantically meaningful joint embeddings between the text and the audio-visual input. Often, such joint embeddings are learnt using pairwise (or triplet) contrastive loss objectives which cannot give enough attention to 'difficult-to-retrieve' samples during training. This problem is especially pronounced in data-scarce settings where the data is relatively small (10% of the large scale MSR-VTT) to cover the rather complex audio-visual embedding space. In this context, we introduce Rudder - a multilingual video-text retrieval dataset that includes audio and textual captions in Marathi, Hindi, Tamil, Kannada, Malayalam and Telugu. Furthermore, we propose to compensate for data scarcity by using domain knowledge to augment supervision. To this end, in addition to the conventional three samples of a triplet (anchor, positive, and negative), we introduce a fourth term - a partial - to define a differential margin based partialorder loss. The partials are heuristically sampled such that they semantically lie in the overlap zone between the positives and the negatives, thereby resulting in broader embedding coverage. Our proposals consistently outperform the conventional max-margin and triplet losses and improve the state-of-the-art on MSR-VTT and DiDeMO datasets. We report benchmark results on Rudder while also observing significant gains using the proposed partial order loss, especially when the language specific retrieval models are jointly trained by availing the cross-lingual alignment across the language-specific datasets. △ Less

Submitted 9 March, 2021; originally announced March 2021.

arXiv:2103.00686 [pdf, other]

doi 10.14778/3485450.3485462

Accelerating Recommendation System Training by Leveraging Popular Choices

Authors: Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, Prashant J. Nair

Abstract: Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation model… ▽ More Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this paper deep dives into the semantics of training data and obtains insights about the feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed with a few embedding entries being accessed up to 10000x more. This paper leverages this asymmetrical access pattern to offer a framework, called FAE, and proposes a hot-embedding aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reduces the data transfers from CPU to GPU. At the same time, FAE engages the GPU to accelerate the executions of these hot embedding entries. Experiments on production-scale recommendation models with real datasets show that FAE reduces the overall training time by 2.3x and 1.52x in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy △ Less

Submitted 28 September, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

ACM Class: I.2.6; C.5.0

Journal ref: Proceedings of the VLDB Endowment, 2022

arXiv:2103.00178 [pdf, other]

doi 10.1016/j.physleta.2021.127521

Doubly Forced Anharmonic Oscillator Model for Floating Potential Fluctuations in DC Glow Discharge Plasma

Authors: K. Jayaprakash, Prince Alex, A. Saravanan, M. Perumal, Thangjam Rishikanta Singh, Suraj Kumar Sinha

Abstract: The Floating Potential Fluctuations (FPF) observed in a dc glow discharge plasma powered with two sources is modeled using an anharmonic oscillator with two forcing terms. In the discharge system, one of the electrode is biased to a negative voltage source (i.e. cathode), and the second electrode is biased to a positive voltage source (i.e. anode), while the stainless-steel vacuum chamber is groun… ▽ More The Floating Potential Fluctuations (FPF) observed in a dc glow discharge plasma powered with two sources is modeled using an anharmonic oscillator with two forcing terms. In the discharge system, one of the electrode is biased to a negative voltage source (i.e. cathode), and the second electrode is biased to a positive voltage source (i.e. anode), while the stainless-steel vacuum chamber is grounded. The dc glow discharge plasma is generated by the application of negative voltage on the cathode with respect to the grounded chamber using one of the power supplies. On the application of positive voltage to the anode using second power supply results in formation of a potential structure on achieving the triggering criteria. This potential structure is referred as anodic double layer (ADL). The evolution of ADL is associated with FPF. Therefore, FPF is analyzed to characterize the ADL's dynamical features. In this work, the experimentally observed FPF compared with numerically obtained oscillations using an anharmonic oscillator model with two forcing terms. Each of these forcing terms are associated with the two power supplies used in the experiment. The experimentally and numerically obtained oscillations from the model are studied using phase-space plot, FFT, Largest Lyapunov exponent (LLE). The dynamical features of oscillations obtained by the model show strong agreement with the experiment and can be extended for a description of complex systems driven by multiple forces. △ Less

Submitted 27 February, 2021; originally announced March 2021.

Comments: 11 pages, 7 figures

arXiv:2008.08887 [pdf, other]

Strong Exciton-Photon Coupling in Large Area MoSe$_2$ and WSe$_2$ Heterostructures Fabricated from Two-Dimensional Materials Grown by Chemical Vapor Deposition

Authors: Daniel J. Gillard, Armando Genco, Seongjoon Ahn, Thomas P. Lyons, Kyung Yeol Ma, A-Rang Jang, Toby Severs Millard, Aurelien A. P. Trichet, Rahul Jayaprakash, Kyriacos Georgiou, David G. Lidzey, Jason M. Smith, Hyeon Suk Shin, Alexander I. Tartakovskii

Abstract: Two-dimensional semiconducting transition metal dichalcogenides embedded in optical microcavities in the strong exciton-photon coupling regime may lead to promising applications in spin and valley addressable polaritonic logic gates and circuits. One significant obstacle for their realization is the inherent lack of scalability associated with the mechanical exfoliation commonly used for fabricati… ▽ More Two-dimensional semiconducting transition metal dichalcogenides embedded in optical microcavities in the strong exciton-photon coupling regime may lead to promising applications in spin and valley addressable polaritonic logic gates and circuits. One significant obstacle for their realization is the inherent lack of scalability associated with the mechanical exfoliation commonly used for fabrication of two-dimensional materials and their heterostructures. Chemical vapor deposition offers an alternative scalable fabrication method for both monolayer semiconductors and other two-dimensional materials, such as hexagonal boron nitride. Observation of the strong light-matter coupling in chemical vapor grown transition metal dichalcogenides has been demonstrated so far in a handful of experiments with monolayer molybdenum disulfide and tungsten disulfide. Here we instead demonstrate the strong exciton-photon coupling in microcavities comprising large area transition metal dichalcogenide / hexagonal boron nitride heterostructures made from chemical vapor deposition grown molybdenum diselenide and tungsten diselenide encapsulated on one or both sides in continuous few-layer boron nitride films also grown by chemical vapor deposition. These transition metal dichalcogenide / hexagonal boron nitride heterostructures show high optical quality comparable with mechanically exfoliated samples, allowing operation in the strong coupling regime in a wide range of temperatures down to 4 Kelvin in tunable and monolithic microcavities, and demonstrating the possibility to successfully develop large area transition metal dichalcogenide based polariton devices. △ Less

Submitted 20 August, 2020; originally announced August 2020.

arXiv:2006.08361 [pdf, other]

An Unsupervised Machine Learning Approach to Assess the ZIP Code Level Impact of COVID-19 in NYC

Authors: Fadoua Khmaissia, Pegah Sagheb Haghighi, Aarthe Jayaprakash, Zhenwei Wu, Sokratis Papadopoulos, Yuan Lai, Freddy T. Nguyen

Abstract: New York City has been recognized as the world's epicenter of the novel Coronavirus pandemic. To identify the key inherent factors that are highly correlated to the Increase Rate of COVID-19 new cases in NYC, we propose an unsupervised machine learning framework. Based on the assumption that ZIP code areas with similar demographic, socioeconomic, and mobility patterns are likely to experience simi… ▽ More New York City has been recognized as the world's epicenter of the novel Coronavirus pandemic. To identify the key inherent factors that are highly correlated to the Increase Rate of COVID-19 new cases in NYC, we propose an unsupervised machine learning framework. Based on the assumption that ZIP code areas with similar demographic, socioeconomic, and mobility patterns are likely to experience similar outbreaks, we select the most relevant features to perform a clustering that can best reflect the spread, and map them down to 9 interpretable categories. We believe that our findings can guide policy makers to promptly anticipate and prevent the spread of the virus by taking the right measures. △ Less

Submitted 18 September, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Presented at ICML 2020 Workshop on the Healthcare Systems, Population Health, and the Role of Health-Tech

arXiv:1909.03220 [pdf]

Ultrafast long-range energy transport via light-matter coupling in organic semiconductor films

Authors: Raj Pandya, Richard Y. S. Chen, Qifei Gu, Jooyoung Sung, Christoph Schnedermann, Oluwafemi S. Ojambati, Rohit Chikkaraddy, Jeffrey Gorman, Gianni Jacucci, Olimpia D. Onelli, Tom Willhammar, Duncan N. Johnstone, Sean M. Collins, Paul A. Midgley, Florian Auras, Tomi Baikie, Rahul Jayaprakash, Fabrice Mathevet, Richard Soucek, Matthew Du, Silvia Vignolini, David G Lidzey, Jeremy J. Baumberg, Richard H. Friend, Thierry Barisien , et al. (7 additional authors not shown)

Abstract: The formation of exciton-polaritons allows the transport of energy over hundreds of nanometres at velocities up to 10^6 m s^-1 in organic semiconductors films in the absence of external cavity structures. The formation of exciton-polaritons allows the transport of energy over hundreds of nanometres at velocities up to 10^6 m s^-1 in organic semiconductors films in the absence of external cavity structures. △ Less

Submitted 7 September, 2019; originally announced September 2019.

arXiv:1909.00553 [pdf, ps, other]

doi 10.1145/3352460.3358281

Touché: Towards Ideal and Efficient Cache Compression By Mitigating Tag Area Overheads

Authors: Seokin Hong, Bulent Abali, Alper Buyuktosunoglu, Michael B. Healy, Prashant J. Nair

Abstract: Compression is seen as a simple technique to increase the effective cache capacity. Unfortunately, compression techniques either incur tag area overheads or restrict data placement to only include neighboring compressed cache blocks to mitigate tag area overheads. Ideally, we should be able to place arbitrary compressed cache blocks without any placement restrictions and tag area overheads. This… ▽ More Compression is seen as a simple technique to increase the effective cache capacity. Unfortunately, compression techniques either incur tag area overheads or restrict data placement to only include neighboring compressed cache blocks to mitigate tag area overheads. Ideally, we should be able to place arbitrary compressed cache blocks without any placement restrictions and tag area overheads. This paper proposes Touché, a framework that enables storing multiple arbitrary compressed cache blocks within a physical cacheline without any tag area overheads. The Touché framework consists of three components. The first component, called the ``Signature'' (SIGN) engine, creates shortened signatures from the tag addresses of compressed blocks. Due to this, the SIGN engine can store multiple signatures in each tag entry. On a cache access, the physical cacheline is accessed only if there is a signature match (which has a negligible probability of false positive). The second component, called the ``Tag Appended Data'' (TADA) mechanism, stores the full tag addresses with data. TADA enables Touché to detect false positive signature matches by ensuring that the actual tag address is available for comparison. The third component, called the ``Superblock Marker'' (SMARK) mechanism, uses a unique marker in the tag entry to indicate the occurrence of compressed cache blocks from neighboring physical addresses in the same cacheline. Touché is completely hardware-based and achieves an average speedup of 12\% (ideal 13\%) when compared to an uncompressed baseline. △ Less

Submitted 2 September, 2019; originally announced September 2019.

Comments: Keywords: Compression, Caches, Tag Array, Data Array, Hashing

Journal ref: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, October 2019, Pages 453-465

arXiv:1806.09990 [pdf]

doi 10.1039/C9SC04950A

Manipulating matter with strong coupling: harvesting triplet excitons in organic exciton microcavities

Authors: Daniel Polak, Rahul Jayaprakash, Anastasia Leventis, Kealan J. Fallon, Harriet Coulthard, Anthony J. Petty II, John Anthony, Hugo Bronstein, David G. Lidzey, Jenny Clark, Andrew J. Musser

Abstract: Exciton-polaritons are quasiparticles with mixed photon and exciton character that demonstrate rich quantum phenomena, novel optoelectronic devices and the potential to modify chemical properties of materials. Organic semiconductors are of current interest for their room-temperature polariton formation. However, within organic optoelectronic devices, it is often the 'dark' spin-1 triplet excitons… ▽ More Exciton-polaritons are quasiparticles with mixed photon and exciton character that demonstrate rich quantum phenomena, novel optoelectronic devices and the potential to modify chemical properties of materials. Organic semiconductors are of current interest for their room-temperature polariton formation. However, within organic optoelectronic devices, it is often the 'dark' spin-1 triplet excitons that dominate operation. These triplets have been largely ignored in treatments of polariton physics. Here we demonstrate polariton population from the triplet manifold via triplet-triplet annihilation, leading to polariton emission that is longer-lived (>microseconds) even than exciton emission in bare films. This enhancement arises from spin-2 triplet-pair states, formed by singlet fission or triplet-triplet annihilation, feeding the polariton. This is possible due to state mixing, which -in the strong coupling regime- leads to sharing of photonic character with states that are formally non-emissive. Such 'photonic sharing' offers the enticing possibility of harvesting or manipulating even states that are formally dark. △ Less

Submitted 26 June, 2018; originally announced June 2018.

Comments: 20 pages, 6 figures

Journal ref: Chem. Sci., 2020, 11, 343-354

Showing 1–50 of 73 results for author: JayaPrakash