Search | arXiv e-print repository

Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding

Authors: Haoran Zhou, Xingchen Song, Brendan Fahy, Qiaochu Song, Binbin Zhang, Zhendong Peng, Anshul Wadhawan, Denglin Jiang, Apurv Verma, Vinay Ramesh, Srivas Prasad, Michele M. Franceschini

Abstract: OpenAI Whisper is a family of robust Automatic Speech Recognition (ASR) models trained on 680,000 hours of audio. However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) structure. We introduce an additional Conn… ▽ More OpenAI Whisper is a family of robust Automatic Speech Recognition (ASR) models trained on 680,000 hours of audio. However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) structure. We introduce an additional Connectionist Temporal Classification (CTC) decoder trained with causal attention masks to generate streaming partial transcripts, while the original Whisper decoder reranks these partial outputs. Our experiments on LibriSpeech and an earnings call dataset demonstrate that, with adequate fine-tuning data, Whisper can be adapted into a capable streaming ASR model. We also introduce a hybrid tokenizer approach, which uses a smaller token space for the CTC decoder while retaining Whisper's original token space for the attention decoder, resulting in improved data efficiency and generalization. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: Accepted to INTERSPEECH 2025

arXiv:2506.03164 [pdf, ps, other]

Test-Time Scaling of Diffusion Models via Noise Trajectory Search

Authors: Vignav Ramesh, Morteza Mardani

Abstract: The iterative and stochastic nature of diffusion models enables test-time scaling, whereby spending additional compute during denoising generates higher-fidelity samples. Increasing the number of denoising steps is the primary scaling axis, but this yields quickly diminishing returns. Instead optimizing the noise trajectory--the sequence of injected noise vectors--is promising, as the specific noi… ▽ More The iterative and stochastic nature of diffusion models enables test-time scaling, whereby spending additional compute during denoising generates higher-fidelity samples. Increasing the number of denoising steps is the primary scaling axis, but this yields quickly diminishing returns. Instead optimizing the noise trajectory--the sequence of injected noise vectors--is promising, as the specific noise realizations critically affect sample quality; but this is challenging due to a high-dimensional search space, complex noise-outcome interactions, and costly trajectory evaluations. We address this by first casting diffusion as a Markov Decision Process (MDP) with a terminal reward, showing tree-search methods such as Monte Carlo tree search (MCTS) to be meaningful but impractical. To balance performance and efficiency, we then resort to a relaxation of MDP, where we view denoising as a sequence of independent contextual bandits. This allows us to introduce an $ε$-greedy search algorithm that globally explores at extreme timesteps and locally exploits during the intermediate steps where de-mixing occurs. Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation, exceeding baselines by up to $164\%$ and matching/exceeding MCTS performance. To our knowledge, this is the first practical method for test-time noise trajectory optimization of arbitrary (non-differentiable) rewards. △ Less

Submitted 24 May, 2025; originally announced June 2025.

arXiv:2505.24002 [pdf, ps, other]

DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment

Authors: Vaishnav Ramesh, Junliang Liu, Haining Wang, Md Jahidul Islam

Abstract: A long-held challenge in no-reference image quality assessment (NR-IQA) learning from human subjective perception is the lack of objective generalization to unseen natural distortions. To address this, we integrate a novel Depth-Guided cross-attention and refinement (Depth-CAR) mechanism, which distills scene depth and spatial features into a structure-aware representation for improved NR-IQA. Thi… ▽ More A long-held challenge in no-reference image quality assessment (NR-IQA) learning from human subjective perception is the lack of objective generalization to unseen natural distortions. To address this, we integrate a novel Depth-Guided cross-attention and refinement (Depth-CAR) mechanism, which distills scene depth and spatial features into a structure-aware representation for improved NR-IQA. This brings in the knowledge of object saliency and relative contrast of the scene for more discriminative feature learning. Additionally, we introduce the idea of TCB (Transformer-CNN Bridge) to fuse high-level global contextual dependencies from a transformer backbone with local spatial features captured by a set of hierarchical CNN (convolutional neural network) layers. We implement TCB and Depth-CAR as multimodal attention-based projection functions to select the most informative features, which also improve training time and inference efficiency. Experimental results demonstrate that our proposed DGIQA model achieves state-of-the-art (SOTA) performance on both synthetic and authentic benchmark datasets. More importantly, DGIQA outperforms SOTA models on cross-dataset evaluations as well as in assessing natural image distortions such as low-light effects, hazy conditions, and lens flares. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 18 pages

arXiv:2504.07421 [pdf, other]

AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery

Authors: Amirhossein Abaskohi, Amrutha Varshini Ramesh, Shailesh Nanisetty, Chirag Goel, David Vazquez, Christopher Pal, Spandana Gella, Giuseppe Carenini, Issam H. Laradji

Abstract: We introduce AgentAda, the first LLM-powered analytics agent that can learn and use new analytics skills to extract more specialized insights. Unlike existing methods that require users to manually decide which data analytics method to apply, AgentAda automatically identifies the skill needed from a library of analytical skills to perform the analysis. This also allows AgentAda to use skills that… ▽ More We introduce AgentAda, the first LLM-powered analytics agent that can learn and use new analytics skills to extract more specialized insights. Unlike existing methods that require users to manually decide which data analytics method to apply, AgentAda automatically identifies the skill needed from a library of analytical skills to perform the analysis. This also allows AgentAda to use skills that existing LLMs cannot perform out of the box. The library covers a range of methods, including clustering, predictive modeling, and NLP techniques like BERT, which allow AgentAda to handle complex analytics tasks based on what the user needs. AgentAda's dataset-to-insight extraction strategy consists of three key steps: (I) a question generator to generate queries relevant to the user's goal and persona, (II) a hybrid Retrieval-Augmented Generation (RAG)-based skill matcher to choose the best data analytics skill from the skill library, and (III) a code generator that produces executable code based on the retrieved skill's documentation to extract key patterns. We also introduce KaggleBench, a benchmark of curated notebooks across diverse domains, to evaluate AgentAda's performance. We conducted a human evaluation demonstrating that AgentAda provides more insightful analytics than existing tools, with 48.78% of evaluators preferring its analyses, compared to 27.67% for the unskilled agent. We also propose a novel LLM-as-a-judge approach that we show is aligned with human evaluation as a way to automate insight quality evaluation at larger scale. △ Less

Submitted 9 April, 2025; originally announced April 2025.

arXiv:2503.19096 [pdf, other]

Uncertainty-Aware Decomposed Hybrid Networks

Authors: Sina Ditzel, Achref Jaziri, Iuliia Pliushch, Visvanathan Ramesh

Abstract: The robustness of image recognition algorithms remains a critical challenge, as current models often depend on large quantities of labeled data. In this paper, we propose a hybrid approach that combines the adaptability of neural networks with the interpretability, transparency, and robustness of domain-specific quasi-invariant operators. Our method decomposes the recognition into multiple task-sp… ▽ More The robustness of image recognition algorithms remains a critical challenge, as current models often depend on large quantities of labeled data. In this paper, we propose a hybrid approach that combines the adaptability of neural networks with the interpretability, transparency, and robustness of domain-specific quasi-invariant operators. Our method decomposes the recognition into multiple task-specific operators that focus on different characteristics, supported by a novel confidence measurement tailored to these operators. This measurement enables the network to prioritize reliable features and accounts for noise. We argue that our design enhances transparency and robustness, leading to improved performance, particularly in low-data regimes. Experimental results in traffic sign detection highlight the effectiveness of the proposed method, especially in semi-supervised and unsupervised scenarios, underscoring its potential for data-constrained applications. △ Less

Submitted 24 March, 2025; originally announced March 2025.

arXiv:2503.11893 [pdf, ps, other]

UStyle: Waterbody Style Transfer of Underwater Scenes by Depth-Guided Feature Synthesis

Authors: Md Abu Bakr Siddique, Vaishnav Ramesh, Junliang Liu, Piyush Singh, Md Jahidul Islam

Abstract: The concept of waterbody style transfer remains largely unexplored in the underwater imaging and vision literature. Traditional image style transfer (STx) methods primarily focus on artistic and photorealistic blending, often failing to preserve object and scene geometry in images captured in high-scattering mediums such as underwater. The wavelength-dependent nonlinear attenuation and depth-depen… ▽ More The concept of waterbody style transfer remains largely unexplored in the underwater imaging and vision literature. Traditional image style transfer (STx) methods primarily focus on artistic and photorealistic blending, often failing to preserve object and scene geometry in images captured in high-scattering mediums such as underwater. The wavelength-dependent nonlinear attenuation and depth-dependent backscattering artifacts further complicate learning underwater image STx from unpaired data. This paper introduces UStyle, the first data-driven learning framework for transferring waterbody styles across underwater images without requiring prior reference images or scene information. We propose a novel depth-aware whitening and coloring transform (DA-WCT) mechanism that integrates physics-based waterbody synthesis to ensure perceptually consistent stylization while preserving scene structure. To enhance style transfer quality, we incorporate carefully designed loss functions that guide UStyle to maintain colorfulness, lightness, structural integrity, and frequency-domain characteristics, as well as high-level content in VGG and CLIP (contrastive language-image pretraining) feature spaces. By addressing domain-specific challenges, UStyle provides a robust framework for no-reference underwater image STx, surpassing state-of-the-art (SOTA) methods that rely solely on end-to-end reconstruction loss. Furthermore, we introduce the UF7D dataset, a curated collection of high-resolution underwater images spanning seven distinct waterbody styles, establishing a benchmark to support future research in underwater image STx. The UStyle inference pipeline and UF7D dataset are released at: https://github.com/uf-robopi/UStyle. △ Less

Submitted 7 June, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

arXiv:2502.05384 [pdf, other]

Demonstrating CavePI: Autonomous Exploration of Underwater Caves by Semantic Guidance

Authors: Alankrit Gupta, Adnan Abdullah, Xianyao Li, Vaishnav Ramesh, Ioannis Rekleitis, Md Jahidul Islam

Abstract: Enabling autonomous robots to safely and efficiently navigate, explore, and map underwater caves is of significant importance to water resource management, hydrogeology, archaeology, and marine robotics. In this work, we demonstrate the system design and algorithmic integration of a visual servoing framework for semantically guided autonomous underwater cave exploration. We present the hardware an… ▽ More Enabling autonomous robots to safely and efficiently navigate, explore, and map underwater caves is of significant importance to water resource management, hydrogeology, archaeology, and marine robotics. In this work, we demonstrate the system design and algorithmic integration of a visual servoing framework for semantically guided autonomous underwater cave exploration. We present the hardware and edge-AI design considerations to deploy this framework on a novel AUV (Autonomous Underwater Vehicle) named CavePI. The guided navigation is driven by a computationally light yet robust deep visual perception module, delivering a rich semantic understanding of the environment. Subsequently, a robust control mechanism enables CavePI to track the semantic guides and navigate within complex cave structures. We evaluate the system through field experiments in natural underwater caves and spring-water sites and further validate its ROS (Robot Operating System)-based digital twin in a simulation environment. Our results highlight how these integrated design choices facilitate reliable navigation under feature-deprived, GPS-denied, and low-visibility conditions. △ Less

Submitted 24 April, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

Comments: V4, 17 pages

arXiv:2501.14082 [pdf, other]

Communicating Activations Between Language Model Agents

Authors: Vignav Ramesh, Kenneth Li

Abstract: Communication between multiple language model (LM) agents has been shown to scale up the reasoning ability of LMs. While natural language has been the dominant medium for inter-LM communication, it is not obvious this should be the standard: not only does natural language communication incur high inference costs that scale quickly with the number of both agents and messages, but also the decoding… ▽ More Communication between multiple language model (LM) agents has been shown to scale up the reasoning ability of LMs. While natural language has been the dominant medium for inter-LM communication, it is not obvious this should be the standard: not only does natural language communication incur high inference costs that scale quickly with the number of both agents and messages, but also the decoding process abstracts away too much rich information that could be otherwise accessed from the internal activations. In this work, we propose a simple technique whereby LMs communicate via activations; concretely, we pause an LM $\textit{B}$'s computation at an intermediate layer, combine its current activation with another LM $\textit{A}$'s intermediate activation via some function $\textit{f}$, then pass $\textit{f}$'s output into the next layer of $\textit{B}$ and continue the forward pass till decoding is complete. This approach scales up LMs on new tasks with zero additional parameters and data, and saves a substantial amount of compute over natural language communication. We test our method with various functional forms $\textit{f}$ on two experimental setups--multi-player coordination games and reasoning benchmarks--and find that it achieves up to $27.0\%$ improvement over natural language communication across datasets with $<$$1/4$ the compute, illustrating the superiority and robustness of activations as an alternative "language" for communication between LMs. △ Less

Submitted 7 May, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

Comments: ICML 2025

arXiv:2411.02436 [pdf, ps, other]

Degeneracies In a Weighted Sum of Two Squares

Authors: Ishan Vinayagam Ramesh, Maxim Olshanii

Abstract: This work is an attempt to classify and quantify instances when a weighted sum of two squares of positive integers, $3n_{1}^2+n_{2}^2$, can be realized in more than one way. Our project was inspired by a particular study of two-dimensional quantum billiards [S. G. Jackson, H. Perrin, G. E. Astrakharchik, and M. Olshanii, SciPost Phys. Core 7, 062 (2024)] where the weighted sums of interest represe… ▽ More This work is an attempt to classify and quantify instances when a weighted sum of two squares of positive integers, $3n_{1}^2+n_{2}^2$, can be realized in more than one way. Our project was inspired by a particular study of two-dimensional quantum billiards [S. G. Jackson, H. Perrin, G. E. Astrakharchik, and M. Olshanii, SciPost Phys. Core 7, 062 (2024)] where the weighted sums of interest represents an energy level with the two integers being the billiard's quantum numbers; there, the 3-fold degeneracies seem to dominate the energy spectrum. Interestingly, contrary to the conventional paradigm, these degeneracies are not caused by some non-commuting symmetries of the system. △ Less

Submitted 31 March, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

MSC Class: 81Q80

arXiv:2408.09838 [pdf, other]

Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion

Authors: Achref Jaziri, Etienne Künzel, Visvanathan Ramesh

Abstract: A continual learning agent builds on previous experiences to develop increasingly complex behaviors by adapting to non-stationary and dynamic environments while preserving previously acquired knowledge. However, scaling these systems presents significant challenges, particularly in balancing the preservation of previous policies with the adaptation of new ones to current environments. This balance… ▽ More A continual learning agent builds on previous experiences to develop increasingly complex behaviors by adapting to non-stationary and dynamic environments while preserving previously acquired knowledge. However, scaling these systems presents significant challenges, particularly in balancing the preservation of previous policies with the adaptation of new ones to current environments. This balance, known as the stability-plasticity dilemma, is especially pronounced in complex multi-agent domains such as the train scheduling problem, where environmental and agent behaviors are constantly changing, and the search space is vast. In this work, we propose addressing these challenges in the train scheduling problem using curriculum learning. We design a curriculum with adjacent skills that build on each other to improve generalization performance. Introducing a curriculum with distinct tasks introduces non-stationarity, which we address by proposing a new algorithm: Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically generates and adjusts Q-function subspaces to handle environmental changes and task requirements. CDE mitigates catastrophic forgetting through EWC while ensuring high plasticity using adaptive rational activation functions. Experimental results demonstrate significant improvements in learning efficiency and adaptability compared to RL baselines and other adapted methods for continual learning, highlighting the potential of our method in managing the stability-plasticity dilemma in the adaptive train scheduling setting. △ Less

Submitted 5 March, 2025; v1 submitted 19 August, 2024; originally announced August 2024.

Comments: 9 Pages, 2 Figures

arXiv:2407.19063 [pdf, other]

Stochastic Thermodynamics of a Linear Optical Cavity Driven On Resonance

Authors: Vashist G. Ramesh, Joris Busink, Rene E. R. Moesbergen, Kevin J. H. Peters, Philip J. Ackermans, Said K. R. Rodriguez

Abstract: We present a complete framework of stochastic thermodynamics for a single-mode linear optical cavity driven on resonance. We first show that the steady-state intra-cavity field follows the equilibrium Boltzmann distribution. The effective temperature is given by the noise variance, and the equilibration rate is the dissipation rate. Next we derive expressions for internal energy, work, heat, and f… ▽ More We present a complete framework of stochastic thermodynamics for a single-mode linear optical cavity driven on resonance. We first show that the steady-state intra-cavity field follows the equilibrium Boltzmann distribution. The effective temperature is given by the noise variance, and the equilibration rate is the dissipation rate. Next we derive expressions for internal energy, work, heat, and free energy of light in a cavity, and formulate the first and second laws of thermodynamics for this system. We then analyze fluctuations in work and heat, and show that they obey universal statistical relations known as fluctuation theorems. Finite time corrections to the fluctuation theorems are also discussed. Additionally, we show that work fluctuations obey the Crook's Fluctuation theorem which is a paradigm for understanding emergent phenomena and estimating free energy differences. The significance of our results is two-fold. On one hand, our work positions optical cavities as a unique platform for fundamental studies of stochastic thermodynamics. On the other hand, our work paves the way for improving the energy efficiency and information processing capabilities of laser-driven optical resonators using a thermodynamics based prescription. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 15 pages, 7 figures

arXiv:2407.13922 [pdf, other]

Synthetic Counterfactual Faces

Authors: Guruprasad V Ramesh, Harrison Rosenberg, Ashish Hooda, Shimaa Ahmed Kassem Fawaz

Abstract: Computer vision systems have been deployed in various applications involving biometrics like human faces. These systems can identify social media users, search for missing persons, and verify identity of individuals. While computer vision models are often evaluated for accuracy on available benchmarks, more annotated data is necessary to learn about their robustness and fairness against semantic d… ▽ More Computer vision systems have been deployed in various applications involving biometrics like human faces. These systems can identify social media users, search for missing persons, and verify identity of individuals. While computer vision models are often evaluated for accuracy on available benchmarks, more annotated data is necessary to learn about their robustness and fairness against semantic distributional shifts in input data, especially in face data. Among annotated data, counterfactual examples grant strong explainability characteristics. Because collecting natural face data is prohibitively expensive, we put forth a generative AI-based framework to construct targeted, counterfactual, high-quality synthetic face data. Our synthetic data pipeline has many use cases, including face recognition systems sensitivity evaluations and image understanding system probes. The pipeline is validated with multiple user studies. We showcase the efficacy of our face generation pipeline on a leading commercial vision model. We identify facial attributes that cause vision systems to fail. △ Less

Submitted 29 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: Paper under review. Full text and results will be updated after acceptance

arXiv:2407.13102 [pdf, other]

Tree semantic segmentation from aerial image time series

Authors: Venkatesh Ramesh, Arthur Ouaknine, David Rolnick

Abstract: Earth's forests play an important role in the fight against climate change, and are in turn negatively affected by it. Effective monitoring of different tree species is essential to understanding and improving the health and biodiversity of forests. In this work, we address the challenge of tree species identification by performing semantic segmentation of trees using an aerial image dataset spann… ▽ More Earth's forests play an important role in the fight against climate change, and are in turn negatively affected by it. Effective monitoring of different tree species is essential to understanding and improving the health and biodiversity of forests. In this work, we address the challenge of tree species identification by performing semantic segmentation of trees using an aerial image dataset spanning over a year. We compare models trained on single images versus those trained on time series to assess the impact of tree phenology on segmentation performances. We also introduce a simple convolutional block for extracting spatio-temporal features from image time series, enabling the use of popular pretrained backbones and methods. We leverage the hierarchical structure of tree species taxonomy by incorporating a custom loss function that refines predictions at three levels: species, genus, and higher-level taxa. Our findings demonstrate the superiority of our methodology in exploiting the time series modality and confirm that enriching labels using taxonomic information improves the semantic segmentation performance. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 19 pages, 8 figures, 4 tables. . Preprint under review

arXiv:2407.00438 [pdf, other]

AI Age Discrepancy: A Novel Parameter for Frailty Assessment in Kidney Tumor Patients

Authors: Rikhil Seshadri, Jayant Siva, Angelica Bartholomew, Clara Goebel, Gabriel Wallerstein-King, Beatriz López Morato, Nicholas Heller, Jason Scovell, Rebecca Campbell, Andrew Wood, Michal Ozery-Flato, Vesna Barros, Maria Gabrani, Michal Rosen-Zvi, Resha Tejpaul, Vidhyalakshmi Ramesh, Nikolaos Papanikolopoulos, Subodh Regmi, Ryan Ward, Robert Abouassaly, Steven C. Campbell, Erick Remer, Christopher Weight

Abstract: Kidney cancer is a global health concern, and accurate assessment of patient frailty is crucial for optimizing surgical outcomes. This paper introduces AI Age Discrepancy, a novel metric derived from machine learning analysis of preoperative abdominal CT scans, as a potential indicator of frailty and postoperative risk in kidney cancer patients. This retrospective study of 599 patients from the 20… ▽ More Kidney cancer is a global health concern, and accurate assessment of patient frailty is crucial for optimizing surgical outcomes. This paper introduces AI Age Discrepancy, a novel metric derived from machine learning analysis of preoperative abdominal CT scans, as a potential indicator of frailty and postoperative risk in kidney cancer patients. This retrospective study of 599 patients from the 2023 Kidney Tumor Segmentation (KiTS) challenge dataset found that a higher AI Age Discrepancy is significantly associated with longer hospital stays and lower overall survival rates, independent of established factors. This suggests that AI Age Discrepancy may provide valuable insights into patient frailty and could thus inform clinical decision-making in kidney cancer treatment. △ Less

Submitted 2 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: 10 pages, 3 figures, 2 tables

arXiv:2406.17296 [pdf, other]

BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks

Authors: Amrutha Varshini Ramesh, Vignesh Ganapathiraman, Issam H. Laradji, Mark Schmidt

Abstract: Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA)… ▽ More Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA) add trainable low-rank matrix factorizations, altering the training dynamics and limiting the model's parameter search to a low-rank subspace. GaLore, a more recent method, employs Gradient Low-Rank Projection to reduce the memory footprint, in the full parameter training setting. However GaLore can only be applied to a subset of the LLM layers that satisfy the "reversibility" property, thus limiting their applicability. In response to these challenges, we introduce BlockLLM, an approach inspired by block coordinate descent. Our method carefully selects and updates a very small subset of the trainable parameters without altering any part of its architecture and training procedure. BlockLLM achieves state-of-the-art performance in both finetuning and pretraining tasks, while reducing the memory footprint of the underlying optimization process. Our experiments demonstrate that fine-tuning with only less than 5% of the parameters, BlockLLM achieves state-of-the-art perplexity scores on the GLUE benchmarks. On Llama model pretrained on C4 dataset, BlockLLM is able to train with significantly less memory than the state-of-the-art, while still maintaining competitive performance. △ Less

Submitted 15 December, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

arXiv:2405.01741 [pdf, other]

PVF (Parameter Vulnerability Factor): A Scalable Metric for Understanding AI Vulnerability Against SDCs in Model Parameters

Authors: Xun Jiao, Fred Lin, Harish D. Dixit, Joel Coburn, Abhinav Pandey, Han Wang, Venkat Ramesh, Jianyu Huang, Wang Xu, Daniel Moore, Sriram Sankar

Abstract: Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can… ▽ More Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can potentially lead to incorrect or degraded model output for users, ultimately affecting the quality and reliability of AI services. In light of the escalating threat, it is crucial to address key questions: How vulnerable are AI models to parameter corruptions, and how do different components (such as modules, layers) of the models exhibit varying vulnerabilities to parameter corruptions? To systematically address this question, we propose a novel quantitative metric, Parameter Vulnerability Factor (PVF), inspired by architectural vulnerability factor (AVF) in computer architecture community, aiming to standardize the quantification of AI model vulnerability against parameter corruptions. We define a model parameter's PVF as the probability that a corruption in that particular model parameter will result in an incorrect output. In this paper, we present several use cases on applying PVF to three types of tasks/models during inference -- recommendation (DLRM), vision classification (CNN), and text classification (BERT), while presenting an in-depth vulnerability analysis on DLRM. PVF can provide pivotal insights to AI hardware designers in balancing the tradeoff between fault protection and performance/efficiency such as mapping vulnerable AI parameter components to well-protected hardware modules. PVF metric is applicable to any AI model and has a potential to help unify and standardize AI vulnerability/resilience evaluation practice. △ Less

Submitted 11 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2402.10791 [pdf, other]

Weak Ergodicity Breaking in Optical Sensing

Authors: V. G. Ramesh, S. R. K. Rodriguez

Abstract: The time-integrated intensity transmitted by a laser driven resonator obeys Lévy's arcsine laws [Ramesh \textit{et al.}, Phys. Rev. Lett. \textit{in press} (2024)]. Here we demonstrate the implications of these laws for optical sensing. We consider the standard goal of resonant optical sensors, namely to report a perturbation to their resonance frequency. In this context, we quantify the sensing p… ▽ More The time-integrated intensity transmitted by a laser driven resonator obeys Lévy's arcsine laws [Ramesh \textit{et al.}, Phys. Rev. Lett. \textit{in press} (2024)]. Here we demonstrate the implications of these laws for optical sensing. We consider the standard goal of resonant optical sensors, namely to report a perturbation to their resonance frequency. In this context, we quantify the sensing precision attained using a finite energy budget combined with time or ensemble averaging of the time-integrated intensity. We find that ensemble averaging outperforms time averaging for short measurement times, but the advantage disappears as the measurement time increases. We explain this behavior in terms of weak ergodicity breaking, arising when the time for the time-integrated intensity to explore the entire phase space diverges but the measurement time remains finite. Evidence that the former time diverges is presented in first passage and return time distributions. Our results are relevant to all types of sensors, in optics and beyond, where stochastic time-integrated fields or intensities are measured to detect an event. In particular, choosing the right averaging strategy can improve sensing precision by orders of magnitude with zero energy cost. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2401.10325 [pdf, other]

A dual-species Rydberg array

Authors: Shraddha Anand, Conor E. Bradley, Ryan White, Vikram Ramesh, Kevin Singh, Hannes Bernien

Abstract: Rydberg atom arrays have emerged as a leading platform for quantum information science. Reaching system sizes of hundreds of long-lived qubits, these arrays are used for highly coherent analog quantum simulation, as well as digital quantum computation. Advanced quantum protocols such as quantum error correction, however, require midcircuit qubit operations, including the replenishment, reset, and… ▽ More Rydberg atom arrays have emerged as a leading platform for quantum information science. Reaching system sizes of hundreds of long-lived qubits, these arrays are used for highly coherent analog quantum simulation, as well as digital quantum computation. Advanced quantum protocols such as quantum error correction, however, require midcircuit qubit operations, including the replenishment, reset, and readout of a subset of qubits. A compelling strategy to achieve these capabilities is a dual-species architecture in which a second atomic species can be controlled without crosstalk, and entangled with the first via Rydberg interactions. Here, we realize a dual-species Rydberg array consisting of rubidium (Rb) and cesium (Cs) atoms, and explore new regimes of interactions and dynamics not accessible in single-species architectures. We achieve enhanced interspecies interactions by electrically tuning the Rydberg states close to a Forster resonance. In this regime, we demonstrate interspecies Rydberg blockade and implement quantum state transfer from one species to another. We then generate a Bell state between Rb and Cs hyperfine qubits via an interspecies controlled-phase gate. Finally, we combine interspecies entanglement with native midcircuit readout to achieve quantum non-demolition measurement of a Rb qubit using an auxiliary Cs qubit. The techniques demonstrated here pave the way toward scalable measurement-based protocols and real-time feedback control in large-scale quantum systems. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.08603 [pdf, other]

Representation Learning in a Decomposed Encoder Design for Bio-inspired Hebbian Learning

Authors: Achref Jaziri, Sina Ditzel, Iuliia Pliushch, Visvanathan Ramesh

Abstract: Modern data-driven machine learning system designs exploit inductive biases in architectural structure, invariance and equivariance requirements, task-specific loss functions, and computational optimization tools. Previous works have illustrated that human-specified quasi-invariant filters can serve as a powerful inductive bias in the early layers of the encoder, enhancing robustness and transpare… ▽ More Modern data-driven machine learning system designs exploit inductive biases in architectural structure, invariance and equivariance requirements, task-specific loss functions, and computational optimization tools. Previous works have illustrated that human-specified quasi-invariant filters can serve as a powerful inductive bias in the early layers of the encoder, enhancing robustness and transparency in learned classifiers. This paper explores this further within the context of representation learning with bio-inspired Hebbian learning rules. We propose a modular framework trained with a bio-inspired variant of contrastive predictive coding, comprising parallel encoders that leverage different invariant visual descriptors as inductive biases. We evaluate the representation learning capacity of our system in classification scenarios using diverse image datasets (GTSRB, STL10, CODEBRIM) and video datasets (UCF101). Our findings indicate that this form of inductive bias significantly improves the robustness of learned representations and narrows the performance gap between models using local Hebbian plasticity rules and those using backpropagation, while also achieving superior performance compared to non-decomposed encoders. △ Less

Submitted 1 March, 2025; v1 submitted 22 November, 2023; originally announced January 2024.

Comments: Published at ECCV2024 Human-Inspired Computer Vision Workshop

arXiv:2311.03721 [pdf, other]

ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning

Authors: Julia Kaltenborn, Charlotte E. E. Lange, Venkatesh Ramesh, Philippe Brouillard, Yaniv Gurwicz, Chandni Nagda, Jakob Runge, Peer Nowack, David Rolnick

Abstract: Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. Howe… ▽ More Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. In addition, we provide a modular dataset pipeline for retrieving and preprocessing additional climate models and scenarios. We showcase the potential of our dataset by using it as a benchmark for ML-based climate model emulation. We gain new insights about the performance and generalization capabilities of the different ML models by analyzing their performance across different climate models. Furthermore, the dataset can be used to train an ML emulator on several climate models instead of just one. Such a "super emulator" can quickly project new climate change scenarios, complementing existing scenarios already provided to policymakers. We believe ClimateSet will create the basis needed for the ML community to tackle climate-related tasks at scale. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: To be published in the 37th Conference on Neural Information Processing Systems (NeurIPS 2023): Track on Datasets and Benchmarks. Project website: https://climateset.github.io/

arXiv:2310.20062 [pdf, other]

Decentralised, Scalable and Privacy-Preserving Synthetic Data Generation

Authors: Vishal Ramesh, Rui Zhao, Naman Goel

Abstract: Synthetic data is emerging as a promising way to harness the value of data, while reducing privacy risks. The potential of synthetic data is not limited to privacy-friendly data release, but also includes complementing real data in use-cases such as training machine learning algorithms that are more fair and robust to distribution shifts etc. There is a lot of interest in algorithmic advances in s… ▽ More Synthetic data is emerging as a promising way to harness the value of data, while reducing privacy risks. The potential of synthetic data is not limited to privacy-friendly data release, but also includes complementing real data in use-cases such as training machine learning algorithms that are more fair and robust to distribution shifts etc. There is a lot of interest in algorithmic advances in synthetic data generation for providing better privacy and statistical guarantees and for its better utilisation in machine learning pipelines. However, for responsible and trustworthy synthetic data generation, it is not sufficient to focus only on these algorithmic aspects and instead, a holistic view of the synthetic data generation pipeline must be considered. We build a novel system that allows the contributors of real data to autonomously participate in differentially private synthetic data generation without relying on a trusted centre. Our modular, general and scalable solution is based on three building blocks namely: Solid (Social Linked Data), MPC (Secure Multi-Party Computation) and Trusted Execution Environments (TEEs). Solid is a specification that lets people store their data securely in decentralised data stores called Pods and control access to their data. MPC refers to the set of cryptographic methods for different parties to jointly compute a function over their inputs while keeping those inputs private. TEEs such as Intel SGX rely on hardware based features for confidentiality and integrity of code and data. We show how these three technologies can be effectively used to address various challenges in responsible and trustworthy synthetic data generation by ensuring: 1) contributor autonomy, 2) decentralisation, 3) privacy and 4) scalability. We support our claims with rigorous empirical results on simulated and real datasets and different synthetic data generation algorithms. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2309.09637 [pdf, other]

Designing a Hybrid Neural System to Learn Real-world Crack Segmentation from Fractal-based Simulation

Authors: Achref Jaziri, Martin Mundt, Andres Fernandez Rodriguez, Visvanathan Ramesh

Abstract: Identification of cracks is essential to assess the structural integrity of concrete infrastructure. However, robust crack segmentation remains a challenging task for computer vision systems due to the diverse appearance of concrete surfaces, variable lighting and weather conditions, and the overlapping of different defects. In particular recent data-driven methods struggle with the limited availa… ▽ More Identification of cracks is essential to assess the structural integrity of concrete infrastructure. However, robust crack segmentation remains a challenging task for computer vision systems due to the diverse appearance of concrete surfaces, variable lighting and weather conditions, and the overlapping of different defects. In particular recent data-driven methods struggle with the limited availability of data, the fine-grained and time-consuming nature of crack annotation, and face subsequent difficulty in generalizing to out-of-distribution samples. In this work, we move past these challenges in a two-fold way. We introduce a high-fidelity crack graphics simulator based on fractals and a corresponding fully-annotated crack dataset. We then complement the latter with a system that learns generalizable representations from simulation, by leveraging both a pointwise mutual information estimate along with adaptive instance normalization as inductive biases. Finally, we empirically highlight how different design choices are symbiotic in bridging the simulation to real gap, and ultimately demonstrate that our introduced system can effectively handle real-world crack segmentation. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.07277 [pdf, ps, other]

Limitations of Face Image Generation

Authors: Harrison Rosenberg, Shimaa Ahmed, Guruprasad V Ramesh, Ramya Korlakai Vinayak, Kassem Fawaz

Abstract: Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the conte… ▽ More Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the context of face generation. Utilizing a combination of qualitative and quantitative measures, including embedding-based metrics and user studies, we present a framework to audit the characteristics of generated faces conditioned on a set of social attributes. We applied our framework on faces generated through state-of-the-art text-to-image diffusion models. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. Furthermore, we present an analytical model that provides insights into how training data selection contributes to the performance of generative models. △ Less

Submitted 21 December, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted to The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

arXiv:2308.02013 [pdf, other]

Federated Representation Learning for Automatic Speech Recognition

Authors: Guruprasad V Ramesh, Gopinath Chennupati, Milind Rao, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo

Abstract: Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respec… ▽ More Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. We use the speaker and chapter information in the unlabeled speech dataset, Libri-Light, to simulate non-IID speaker-siloed data distributions and pre-train an LSTM encoder with the Contrastive Predictive Coding framework with FedSGD. We show that the pre-trained ASR encoder in FL performs as well as a centrally pre-trained model and produces an improvement of 12-15% (WER) compared to no pre-training. We further adapt the federated pre-trained models to a new language, French, and show a 20% (WER) improvement over no pre-training. △ Less

Submitted 7 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: Accepted at ISCA SPSC Symposium 3rd Symposium on Security and Privacy in Speech Communication, 2023

arXiv:2307.01169 [pdf, other]

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

Authors: Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She

Abstract: We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem di… ▽ More We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension $n$. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require $O(n^2)$ time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.15009 [pdf, other]

Particle-Based Simulations of Electrophoretic Deposition with Adaptive Physics Models

Authors: John J. Karnes, Andrew J. Pascall, Christoph Rehbock, Vaijayanthi Ramesh, Marcus A. Worsley, Stephan Barcikowski, Elaine Lee, Brian Giera

Abstract: This work represents an extension of mesoscale particle-based modeling of electrophoretic deposition (EPD), which has relied exclusively on pairwise interparticle interactions described by Derjaguin-Landau-Verwey-Overbeek (DLVO) theory. With this standard treatment, particles continuously move and interact via excluded volume and electrostatic pair potentials under the influence of external fields… ▽ More This work represents an extension of mesoscale particle-based modeling of electrophoretic deposition (EPD), which has relied exclusively on pairwise interparticle interactions described by Derjaguin-Landau-Verwey-Overbeek (DLVO) theory. With this standard treatment, particles continuously move and interact via excluded volume and electrostatic pair potentials under the influence of external fields throughout the EPD process. The physics imposed by DLVO theory may not be appropriate to describe all systems, considering the vast material, operational, and application space available to EPD. As such, we present three modifications to standard particle-based models, each rooted in the ability to dynamically change interparticle interactions as simulated deposition progresses. This approach allows simulations to capture charge transfer and/or irreversible adsorption based on tunable parameters. We evaluate and compare simulated deposits formed under new physical assumptions, demonstrating the range of systems that these adaptive physics models may capture. △ Less

Submitted 28 June, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 34 pages, 10 figures

Report number: LLNL-JRNL-849162

arXiv:2306.02182 [pdf, other]

FlairNLP at SemEval-2023 Task 6b: Extraction of Legal Named Entities from Legal Texts using Contextual String Embeddings

Authors: Vinay N Ramesh, Rohan Eswara

Abstract: Indian court legal texts and processes are essential towards the integrity of the judicial system and towards maintaining the social and political order of the nation. Due to the increase in number of pending court cases, there is an urgent need to develop tools to automate many of the legal processes with the knowledge of artificial intelligence. In this paper, we employ knowledge extraction tech… ▽ More Indian court legal texts and processes are essential towards the integrity of the judicial system and towards maintaining the social and political order of the nation. Due to the increase in number of pending court cases, there is an urgent need to develop tools to automate many of the legal processes with the knowledge of artificial intelligence. In this paper, we employ knowledge extraction techniques, specially the named entity extraction of legal entities within court case judgements. We evaluate several state of the art architectures in the realm of sequence labeling using models trained on a curated dataset of legal texts. We observe that a Bi-LSTM model trained on Flair Embeddings achieves the best results, and we also publish the BIO formatted dataset as part of this paper. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: 5 pages, 4 figures

arXiv:2306.01570 [pdf]

Spatio-Temporal Deep Learning-Assisted Reduced Security-Constrained Unit Commitment

Authors: Arun Venkatesh Ramesh, Xingpeng Li

Abstract: Security-constrained unit commitment (SCUC) is a computationally complex process utilized in power system day-ahead scheduling and market clearing. SCUC is run daily and requires state-of-the-art algorithms to speed up the process. The constraints and data associated with SCUC are both geographically and temporally correlated to ensure the reliability of the solution, which further increases the c… ▽ More Security-constrained unit commitment (SCUC) is a computationally complex process utilized in power system day-ahead scheduling and market clearing. SCUC is run daily and requires state-of-the-art algorithms to speed up the process. The constraints and data associated with SCUC are both geographically and temporally correlated to ensure the reliability of the solution, which further increases the complexity. In this paper, an advanced machine learning (ML) model is used to study the patterns in power system historical data, which inherently considers both spatial and temporal (ST) correlations in constraints. The ST-correlated ML model is trained to understand spatial correlation by considering graph neural networks (GNN) whereas temporal sequences are studied using long short-term memory (LSTM) networks. The proposed approach is validated on several test systems namely, IEEE 24-Bus system, IEEE-73 Bus system, IEEE 118-Bus system, and synthetic South-Carolina (SC) 500-Bus system. Moreover, B-θ and power transfer distribution factor (PTDF) based SCUC formulations were considered in this research. Simulation results demonstrate that the ST approach can effectively predict generator commitment schedule and classify critical and non-critical lines in the system which are utilized for model reduction of SCUC to obtain computational enhancement without loss in solution quality △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 8 Figures, 5 Tables, 1 Algorithm

arXiv:2305.14452 [pdf, other]

Fourier Neural Operators for Arbitrary Resolution Climate Data Downscaling

Authors: Qidong Yang, Alex Hernandez-Garcia, Paula Harder, Venkatesh Ramesh, Prasanna Sattegeri, Daniela Szwarcman, Campbell D. Watson, David Rolnick

Abstract: Climate simulations are essential in guiding our understanding of climate change and responding to its effects. However, it is computationally expensive to resolve complex climate processes at high spatial resolution. As one way to speed up climate simulations, neural networks have been used to downscale climate variables from fast-running low-resolution simulations, but high-resolution training d… ▽ More Climate simulations are essential in guiding our understanding of climate change and responding to its effects. However, it is computationally expensive to resolve complex climate processes at high spatial resolution. As one way to speed up climate simulations, neural networks have been used to downscale climate variables from fast-running low-resolution simulations, but high-resolution training data are often unobtainable or scarce, greatly limiting accuracy. In this work, we propose a downscaling method based on the Fourier neural operator. It trains with data of a small upsampling factor and then can zero-shot downscale its input to arbitrary unseen high resolution. Evaluated both on ERA5 climate model data and on the Navier-Stokes equation solution data, our downscaling model significantly outperforms state-of-the-art convolutional and generative adversarial downscaling models, both in standard single-resolution downscaling and in zero-shot generalization to higher upsampling factors. Furthermore, we show that our method also outperforms state-of-the-art data-driven partial differential equation solvers on Navier-Stokes equations. Overall, our work bridges the gap between simulation of a physical process and interpolation of low-resolution output, showing that it is possible to combine both approaches and significantly improve upon each other. △ Less

Submitted 30 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Presented at the ICLR 2023 workshop on "Tackling Climate Change with Machine Learning"

arXiv:2210.06340 [pdf, other]

Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors

Authors: Vignav Ramesh, Nathan Andrew Chi, Pranav Rajpurkar

Abstract: Current deep learning models trained to generate radiology reports from chest radiographs are capable of producing clinically accurate, clear, and actionable text that can advance patient care. However, such systems all succumb to the same problem: making hallucinated references to non-existent prior reports. Such hallucinations occur because these models are trained on datasets of real-world pati… ▽ More Current deep learning models trained to generate radiology reports from chest radiographs are capable of producing clinically accurate, clear, and actionable text that can advance patient care. However, such systems all succumb to the same problem: making hallucinated references to non-existent prior reports. Such hallucinations occur because these models are trained on datasets of real-world patient reports that inherently refer to priors. To this end, we propose two methods to remove references to priors in radiology reports: (1) a GPT-3-based few-shot approach to rewrite medical reports without references to priors; and (2) a BioBERT-based token classification approach to directly remove words referring to priors. We use the aforementioned approaches to modify MIMIC-CXR, a publicly available dataset of chest X-rays and their associated free-text radiology reports; we then retrain CXR-RePaiR, a radiology report generation system, on the adapted MIMIC-CXR dataset. We find that our re-trained model--which we call CXR-ReDonE--outperforms previous report generation methods on clinical metrics, achieving an average BERTScore of 0.2351 (2.57% absolute improvement). We expect our approach to be broadly valuable in enabling current radiology report generation systems to be more directly integrated into clinical pipelines. △ Less

Submitted 13 October, 2022; v1 submitted 26 September, 2022; originally announced October 2022.

Comments: 13 pages, 1 figure, 11 tables

arXiv:2208.11716 [pdf, other]

doi 10.1126/science.ade5337

Mid-circuit correction of correlated phase errors using an array of spectator qubits

Authors: Kevin Singh, Conor E. Bradley, Shraddha Anand, Vikram Ramesh, Ryan White, Hannes Bernien

Abstract: Scaling up invariably error-prone quantum processors is a formidable challenge. While quantum error correction ultimately promises fault-tolerant operation, the required qubit overhead and error thresholds are daunting, and many codes break down under correlated noise. Recent proposals have suggested a complementary approach based on co-located, auxiliary 'spectator' qubits. These act as in-situ p… ▽ More Scaling up invariably error-prone quantum processors is a formidable challenge. While quantum error correction ultimately promises fault-tolerant operation, the required qubit overhead and error thresholds are daunting, and many codes break down under correlated noise. Recent proposals have suggested a complementary approach based on co-located, auxiliary 'spectator' qubits. These act as in-situ probes of noise, and enable real-time, coherent corrections of the resulting errors on the data qubits. Here, we use an array of cesium spectator qubits to correct correlated phase errors on an array of rubidium data qubits. Crucially, by combining in-sequence readouts, data processing, and feed-forward operations, these correlated errors are suppressed within the execution of the quantum circuit. The protocol is broadly applicable to quantum information platforms, and our approach establishes key tools for scaling neutral-atom quantum processors: mid-circuit readout of atom arrays, real-time processing and feed-forward, and coherent mid-circuit reloading of atomic qubits. △ Less

Submitted 29 August, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

Comments: 14 pages, 9 figures

arXiv:2208.11563 [pdf]

Contrastive learning-based pretraining improves representation and transferability of diabetic retinopathy classification models

Authors: Minhaj Nur Alam, Rikiya Yamashita, Vignav Ramesh, Tejas Prabhune, Jennifer I. Lim, R. V. P. Chan, Joelle Hallak, Theodore Leng, Daniel Rubin

Abstract: Self supervised contrastive learning based pretraining allows development of robust and generalized deep learning models with small, labeled datasets, reducing the burden of label generation. This paper aims to evaluate the effect of CL based pretraining on the performance of referrable vs non referrable diabetic retinopathy (DR) classification. We have developed a CL based framework with neural s… ▽ More Self supervised contrastive learning based pretraining allows development of robust and generalized deep learning models with small, labeled datasets, reducing the burden of label generation. This paper aims to evaluate the effect of CL based pretraining on the performance of referrable vs non referrable diabetic retinopathy (DR) classification. We have developed a CL based framework with neural style transfer (NST) augmentation to produce models with better representations and initializations for the detection of DR in color fundus images. We compare our CL pretrained model performance with two state of the art baseline models pretrained with Imagenet weights. We further investigate the model performance with reduced labeled training data (down to 10 percent) to test the robustness of the model when trained with small, labeled datasets. The model is trained and validated on the EyePACS dataset and tested independently on clinical data from the University of Illinois, Chicago (UIC). Compared to baseline models, our CL pretrained FundusNet model had higher AUC (CI) values (0.91 (0.898 to 0.930) vs 0.80 (0.783 to 0.820) and 0.83 (0.801 to 0.853) on UIC data). At 10 percent labeled training data, the FundusNet AUC was 0.81 (0.78 to 0.84) vs 0.58 (0.56 to 0.64) and 0.63 (0.60 to 0.66) in baseline models, when tested on the UIC dataset. CL based pretraining with NST significantly improves DL classification performance, helps the model generalize well (transferable from EyePACS to UIC data), and allows training with small, annotated datasets, therefore reducing ground truth annotation burden of the clinicians. △ Less

Submitted 24 August, 2022; originally announced August 2022.

arXiv:2208.07432 [pdf, other]

Arcsine Laws of Light

Authors: V. G. Ramesh, K. J. H. Peters, S. R. K. Rodriguez

Abstract: We demonstrate that light in a coherently driven resonator obeys Lévy's arcsine laws -- a cornerstone of extreme value statistics. This behavior emerges asymptotically in the time-integrated transmitted intensity, an important quantity which is measured by every photodetector. We furthermore demonstrate a universal algebraic convergence to the arcsine laws as the integration time increases, indepe… ▽ More We demonstrate that light in a coherently driven resonator obeys Lévy's arcsine laws -- a cornerstone of extreme value statistics. This behavior emerges asymptotically in the time-integrated transmitted intensity, an important quantity which is measured by every photodetector. We furthermore demonstrate a universal algebraic convergence to the arcsine laws as the integration time increases, independent of the balance between conservative and non-conservative forces exerted on the light field. Through numerical simulations we verify that the arcsine laws are also obeyed by the light field quadratures, and in a Kerr nonlinear resonator supporting non-Gaussian states of light. Our results are relevant to fundamental studies and technological applications of coherently driven resonators (in e.g., optics, microwave photonics, and acoustics), which in turn open up perspectives for probing emergent statistical structure in new regimes and in systems with memory. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: 7 pages, 4 figures

arXiv:2208.06742 [pdf]

Feasibility Layer Aided Machine Learning Approach for Day-Ahead Operations

Authors: Arun Venkatesh Ramesh, Xingpeng Li

Abstract: Day-ahead operations involves a complex and computationally intensive optimization process to determine the generator commitment schedule and dispatch. The optimization process is a mixed-integer linear program (MILP) also known as security-constrained unit commitment (SCUC). Independent system operators (ISOs) run SCUC daily and require state-of-the-art algorithms to speed up the process. Existin… ▽ More Day-ahead operations involves a complex and computationally intensive optimization process to determine the generator commitment schedule and dispatch. The optimization process is a mixed-integer linear program (MILP) also known as security-constrained unit commitment (SCUC). Independent system operators (ISOs) run SCUC daily and require state-of-the-art algorithms to speed up the process. Existing patterns in historical information can be leveraged for model reduction of SCUC, which can provide significant time savings. In this paper, machine learning (ML) based classification approaches, namely logistic regression, neural networks, random forest and K-nearest neighbor, were studied for model reduction of SCUC. The ML was then aided with a feasibility layer (FL) and post-process technique to ensure high-quality solutions. The proposed approach is validated on several test systems namely, IEEE 24-Bus system, IEEE-73 Bus system, IEEE 118-Bus system, 500-Bus system, and Polish 2383-Bus system. Moreover, model reduction of a stochastic SCUC (SSCUC) was demonstrated utilizing a modified IEEE 24-Bus system with renewable generation. Simulation results demonstrate a high training accuracy to identify commitment schedule while FL and post-process ensure ML predictions do not lead to infeasible solutions with minimal loss in solution quality. △ Less

Submitted 13 August, 2022; originally announced August 2022.

Comments: 10 pages, 9 figures, 8 tables

arXiv:2208.05424 [pdf, other]

Hard-Constrained Deep Learning for Climate Downscaling

Authors: Paula Harder, Alex Hernandez-Garcia, Venkatesh Ramesh, Qidong Yang, Prasanna Sattigeri, Daniela Szwarcman, Campbell Watson, David Rolnick

Abstract: The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and, therefore, often generate coarse-resolution predictions. Statistical downscaling, including super-resolution methods from deep learning, can p… ▽ More The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and, therefore, often generate coarse-resolution predictions. Statistical downscaling, including super-resolution methods from deep learning, can provide an efficient method of upsampling low-resolution data. However, despite achieving visually compelling results in some cases, such models frequently violate conservation laws when predicting physical variables. In order to conserve physical quantities, here we introduce methods that guarantee statistical constraints are satisfied by a deep learning downscaling model, while also improving their performance according to traditional metrics. We compare different constraining approaches and demonstrate their applicability across different neural architectures as well as a variety of climate and weather data sets. Besides enabling faster and more accurate climate predictions through downscaling, we also show that our novel methodologies can improve super-resolution for satellite data and natural images data sets. △ Less

Submitted 29 February, 2024; v1 submitted 8 August, 2022; originally announced August 2022.

arXiv:2205.11443 [pdf, other]

Unsupervised Tokenization Learning

Authors: Anton Kolonin, Vignav Ramesh

Abstract: In the presented study, we discover that the so-called "transition freedom" metric appears superior for unsupervised tokenization purposes in comparison to statistical metrics such as mutual information and conditional probability, providing F-measure scores in range from 0.71 to 1.0 across explored multilingual corpora. We find that different languages require different offshoots of that metric (… ▽ More In the presented study, we discover that the so-called "transition freedom" metric appears superior for unsupervised tokenization purposes in comparison to statistical metrics such as mutual information and conditional probability, providing F-measure scores in range from 0.71 to 1.0 across explored multilingual corpora. We find that different languages require different offshoots of that metric (such as derivative, variance, and "peak values") for successful tokenization. Larger training corpora do not necessarily result in better tokenization quality, while compressing the models by eliminating statistically weak evidence tends to improve performance. The proposed unsupervised tokenization technique provides quality better than or comparable to lexicon-based ones, depending on the language. △ Less

Submitted 15 December, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: 16 pages, 9 figures; Paper accepted to the EMNLP 2022 conference

arXiv:2111.14671 [pdf, other]

ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models

Authors: Salva Rühling Cachay, Venkatesh Ramesh, Jason N. S. Cole, Howard Barker, David Rolnick

Abstract: Numerical simulations of Earth's weather and climate require substantial amounts of computation. This has led to a growing interest in replacing subroutines that explicitly compute physical processes with approximate machine learning (ML) methods that are fast at inference time. Within weather and climate models, atmospheric radiative transfer (RT) calculations are especially expensive. This has m… ▽ More Numerical simulations of Earth's weather and climate require substantial amounts of computation. This has led to a growing interest in replacing subroutines that explicitly compute physical processes with approximate machine learning (ML) methods that are fast at inference time. Within weather and climate models, atmospheric radiative transfer (RT) calculations are especially expensive. This has made them a popular target for neural network-based emulators. However, prior work is hard to compare due to the lack of a comprehensive dataset and standardized best practices for ML benchmarking. To fill this gap, we build a large dataset, ClimART, with more than \emph{10 million samples from present, pre-industrial, and future climate conditions}, based on the Canadian Earth System Model. ClimART poses several methodological challenges for the ML community, such as multiple out-of-distribution test sets, underlying domain physics, and a trade-off between accuracy and inference speed. We also present several novel baselines that indicate shortcomings of datasets and network architectures used in prior work. Download instructions, baselines, and code are available at: https://github.com/RolnickLab/climart △ Less

Submitted 29 November, 2021; originally announced November 2021.

Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks

arXiv:2111.09824 [pdf]

Machine Learning Assisted Approach for Security-Constrained Unit Commitment

Authors: Arun Venkatesh Ramesh, Xingpeng Li

Abstract: Security-constrained unit commitment (SCUC) is solved for power system day-ahead generation scheduling, which is a large-scale mixed-integer linear programming problem and is very computationally intensive. Model reduction of SCUC may bring significant time savings. In this work, a novel approach is proposed to effectively utilize machine learning (ML) to reduce the problem size of SCUC. An ML mod… ▽ More Security-constrained unit commitment (SCUC) is solved for power system day-ahead generation scheduling, which is a large-scale mixed-integer linear programming problem and is very computationally intensive. Model reduction of SCUC may bring significant time savings. In this work, a novel approach is proposed to effectively utilize machine learning (ML) to reduce the problem size of SCUC. An ML model using logistic regression (LR) algorithm is proposed and trained with historical nodal demand profiles and the respective commitment schedules. The ML outputs are processed and analyzed to reduce variables and constraints in SCUC. The proposed approach is validated on several standard test systems including IEEE 24-bus system, IEEE 73-bus system, IEEE 118-bus system, synthetic South Carolina 500-bus system and Polish 2383-bus system. Simulation results demonstrate that the use of the prediction from the proposed LR model in SCUC model reduction can substantially reduce the computing time while maintaining solution quality. △ Less

Submitted 12 July, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: 6 Pages, 5 Figures, 3 tables, 1 algorithm

arXiv:2111.03814 [pdf]

doi 10.14445/22315381/IJETT-V69I10P216

Compensation of Reactive Power in Grid-Connected Solar PV Array System Using STATCOM and Fixed Capacitor Bank

Authors: CH Venkata Ramesh, A Manjunatha

Abstract: In this article, we propose reactive compensation for the PV integrated grid system using a STATCOM and a fixed capacitor bank. This paper presents a design calculation for a PV integrated grid system with a fixed capacitor and STATCOM. The proposed system is simulated and tested using the MATLAB Simulink software package. The suggested system has been evaluated under a variety of operating circum… ▽ More In this article, we propose reactive compensation for the PV integrated grid system using a STATCOM and a fixed capacitor bank. This paper presents a design calculation for a PV integrated grid system with a fixed capacitor and STATCOM. The proposed system is simulated and tested using the MATLAB Simulink software package. The suggested system has been evaluated under a variety of operating circumstances, including changing solar PV array irradiance and changing reactive load power. Detailed simulation and comparisons between the fixed capacitor and STATCOM represented. △ Less

Submitted 6 November, 2021; originally announced November 2021.

Comments: Number of pages-6 and Number of figures28. Published with International Journal of Engineering Trends and Technology (IJETT)

Journal ref: International Journal of Engineering Trends and Technology 69.10(2021):128-136

arXiv:2110.03281 [pdf, other]

Detecting Autism Spectrum Disorders with Machine Learning Models Using Speech Transcripts

Authors: Vikram Ramesh, Rida Assaf

Abstract: Autism spectrum disorder (ASD) can be defined as a neurodevelopmental disorder that affects how children interact, communicate and socialize with others. This disorder can occur in a broad spectrum of symptoms, with varying effects and severity. While there is no permanent cure for ASD, early detection and proactive treatment can substantially improve the lives of many children. Current methods to… ▽ More Autism spectrum disorder (ASD) can be defined as a neurodevelopmental disorder that affects how children interact, communicate and socialize with others. This disorder can occur in a broad spectrum of symptoms, with varying effects and severity. While there is no permanent cure for ASD, early detection and proactive treatment can substantially improve the lives of many children. Current methods to accurately diagnose ASD are invasive, time-consuming, and tedious. They can also be subjective perspectives of a number of clinicians involved, including pediatricians, speech pathologists, psychologists, and psychiatrists. New technologies are rapidly emerging that include machine learning models using speech, computer vision from facial, retinal, and brain MRI images of patients to accurately and timely detect this disorder. Our research focuses on computational linguistics and machine learning using speech data from TalkBank, the world's largest spoken language database. We used data of both ASD and Typical Development (TD) in children from TalkBank to develop machine learning models to accurately predict ASD. More than 50 features were used from specifically two datasets in TalkBank to run our experiments using five different classifiers. Logistic Regression and Random Forest models were found to be the most effective for each of these two main datasets, with an accuracy of 0.75. These experiments confirm that while significant opportunities exist for improving the accuracy, machine learning models can reliably predict ASD status in children for effective diagnosis. △ Less

Submitted 7 October, 2021; originally announced October 2021.

arXiv:2110.01616 [pdf, other]

A spatial-photonic Ising machine to solve the two-way number-partitioning problem

Authors: Vikram Ramesh, Vighnesh Natarajan, Anil Prabhakar

Abstract: We evaluate the performance of different algorithms in minimizing the Hamiltonian of a spatial-photonic Ising machine (SPIM). We then encode the number-partitioning problem on the SPIM and adiabatically arrive at good solutions for the problem for over 16000 spins, with a time complexity that only scales linearly with problem size. Finally, we benchmark our machine performance against the classica… ▽ More We evaluate the performance of different algorithms in minimizing the Hamiltonian of a spatial-photonic Ising machine (SPIM). We then encode the number-partitioning problem on the SPIM and adiabatically arrive at good solutions for the problem for over 16000 spins, with a time complexity that only scales linearly with problem size. Finally, we benchmark our machine performance against the classical solver, Gurobi, and also a D-Wave 5000+ quantum annealer. With just one spatial light modulator, and and adiabatic evolution scheme for the phase, our results surpass current state-of-the-art SPIMs. We reduce hardware costs, and can solve larger problems more efficiently. △ Less

Submitted 3 October, 2021; originally announced October 2021.

arXiv:2108.04220 [pdf, other]

End-to-end Malaria Diagnosis and 3D Cell Rendering with Deep Learning

Authors: Vignav Ramesh

Abstract: Malaria is a parasitic infection that poses a significant burden on global health. It kills one child every 30 seconds and over one million people annually. If diagnosed in a timely manner, however, most people can be effectively treated with antimalarial therapy. Several deaths due to malaria are byproducts of disparities in the social determinants of health; the current gold standard for diagnos… ▽ More Malaria is a parasitic infection that poses a significant burden on global health. It kills one child every 30 seconds and over one million people annually. If diagnosed in a timely manner, however, most people can be effectively treated with antimalarial therapy. Several deaths due to malaria are byproducts of disparities in the social determinants of health; the current gold standard for diagnosing malaria requires microscopes, reagents, and other equipment that most patients of low socioeconomic brackets do not have access to. In this paper, we propose a convolutional neural network (CNN) architecture that allows for rapid automated diagnosis of malaria (achieving a high classification accuracy of 98%), as well as a deep neural network (DNN) based three-dimensional (3D) modeling algorithm that renders 3D models of parasitic cells in augmented reality (AR). This creates an opportunity to optimize the current workflow for malaria diagnosis and demonstrates potential for deep learning models to improve telemedicine practices and patient health literacy on a global scale. △ Less

Submitted 8 July, 2021; originally announced August 2021.

Comments: 7 pages, 2 figures

arXiv:2106.02585 [pdf, other]

A Procedural World Generation Framework for Systematic Evaluation of Continual Learning

Authors: Timm Hess, Martin Mundt, Iuliia Pliushch, Visvanathan Ramesh

Abstract: Several families of continual learning techniques have been proposed to alleviate catastrophic interference in deep neural network training on non-stationary data. However, a comprehensive comparison and analysis of limitations remains largely open due to the inaccessibility to suitable datasets. Empirical examination not only varies immensely between individual works, it further currently relies… ▽ More Several families of continual learning techniques have been proposed to alleviate catastrophic interference in deep neural network training on non-stationary data. However, a comprehensive comparison and analysis of limitations remains largely open due to the inaccessibility to suitable datasets. Empirical examination not only varies immensely between individual works, it further currently relies on contrived composition of benchmarks through subdivision and concatenation of various prevalent static vision datasets. In this work, our goal is to bridge this gap by introducing a computer graphics simulation framework that repeatedly renders only upcoming urban scene fragments in an endless real-time procedural world generation process. At its core lies a modular parametric generative model with adaptable generative factors. The latter can be used to flexibly compose data streams, which significantly facilitates a detailed analysis and allows for effortless investigation of various continual learning schemes. △ Less

Submitted 13 December, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: Published in Neural Information Processing Systems, Dataset and Benchmarks Track 2021

arXiv:2105.08997 [pdf, other]

When Deep Classifiers Agree: Analyzing Correlations between Learning Order and Image Statistics

Authors: Iuliia Pliushch, Martin Mundt, Nicolas Lupp, Visvanathan Ramesh

Abstract: Although a plethora of architectural variants for deep classification has been introduced over time, recent works have found empirical evidence towards similarities in their training process. It has been hypothesized that neural networks converge not only to similar representations, but also exhibit a notion of empirical agreement on which data instances are learned first. Following in the latter… ▽ More Although a plethora of architectural variants for deep classification has been introduced over time, recent works have found empirical evidence towards similarities in their training process. It has been hypothesized that neural networks converge not only to similar representations, but also exhibit a notion of empirical agreement on which data instances are learned first. Following in the latter works$'$ footsteps, we define a metric to quantify the relationship between such classification agreement over time, and posit that the agreement phenomenon can be mapped to core statistics of the investigated dataset. We empirically corroborate this hypothesis across the CIFAR10, Pascal, ImageNet and KTH-TIPS2 datasets. Our findings indicate that agreement seems to be independent of specific architectures, training hyper-parameters or labels, albeit follows an ordering according to image statistics. △ Less

Submitted 19 July, 2022; v1 submitted 19 May, 2021; originally announced May 2021.

Comments: Accepted for publication at ECCV 2022. Version includes supplementary material

arXiv:2105.08147 [pdf, other]

COVID-19 Lung Lesion Segmentation Using a Sparsely Supervised Mask R-CNN on Chest X-rays Automatically Computed from Volumetric CTs

Authors: Vignav Ramesh, Blaine Rister, Daniel L. Rubin

Abstract: Chest X-rays of coronavirus disease 2019 (COVID-19) patients are frequently obtained to determine the extent of lung disease and are a valuable source of data for creating artificial intelligence models. Most work to date assessing disease severity on chest imaging has focused on segmenting computed tomography (CT) images; however, given that CTs are performed much less frequently than chest X-ray… ▽ More Chest X-rays of coronavirus disease 2019 (COVID-19) patients are frequently obtained to determine the extent of lung disease and are a valuable source of data for creating artificial intelligence models. Most work to date assessing disease severity on chest imaging has focused on segmenting computed tomography (CT) images; however, given that CTs are performed much less frequently than chest X-rays for COVID-19 patients, automated lung lesion segmentation on chest X-rays could be clinically valuable. There currently exists a universal shortage of chest X-rays with ground truth COVID-19 lung lesion annotations, and manually contouring lung opacities is a tedious, labor-intensive task. To accelerate severity detection and augment the amount of publicly available chest X-ray training data for supervised deep learning (DL) models, we leverage existing annotated CT images to generate frontal projection "chest X-ray" images for training COVID-19 chest X-ray models. In this paper, we propose an automated pipeline for segmentation of COVID-19 lung lesions on chest X-rays comprised of a Mask R-CNN trained on a mixed dataset of open-source chest X-rays and coronal X-ray projections computed from annotated volumetric CTs. On a test set containing 40 chest X-rays of COVID-19 positive patients, our model achieved IoU scores of 0.81 $\pm$ 0.03 and 0.79 $\pm$ 0.03 when trained on a dataset of 60 chest X-rays and on a mixed dataset of 10 chest X-rays and 50 projections from CTs, respectively. Our model far outperforms current baselines with limited supervised training and may assist in automated COVID-19 severity quantification on chest X-rays. △ Less

Submitted 19 May, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: 8 pages, 5 figures

arXiv:2105.00830 [pdf]

Natural Language Generation Using Link Grammar for General Conversational Intelligence

Authors: Vignav Ramesh, Anton Kolonin

Abstract: Many current artificial general intelligence (AGI) and natural language processing (NLP) architectures do not possess general conversational intelligence--that is, they either do not deal with language or are unable to convey knowledge in a form similar to the human language without manual, labor-intensive methods such as template-based customization. In this paper, we propose a new technique to a… ▽ More Many current artificial general intelligence (AGI) and natural language processing (NLP) architectures do not possess general conversational intelligence--that is, they either do not deal with language or are unable to convey knowledge in a form similar to the human language without manual, labor-intensive methods such as template-based customization. In this paper, we propose a new technique to automatically generate grammatically valid sentences using the Link Grammar database. This natural language generation method far outperforms current state-of-the-art baselines and may serve as the final component in a proto-AGI question answering pipeline that understandably handles natural language material. △ Less

Submitted 19 April, 2021; originally announced May 2021.

Comments: 17 pages, 5 figures

arXiv:2104.06788 [pdf, other]

Neural Architecture Search of Deep Priors: Towards Continual Learning without Catastrophic Interference

Authors: Martin Mundt, Iuliia Pliushch, Visvanathan Ramesh

Abstract: In this paper we analyze the classification performance of neural network structures without parametric inference. Making use of neural architecture search, we empirically demonstrate that it is possible to find random weight architectures, a deep prior, that enables a linear classification to perform on par with fully trained deep counterparts. Through ablation experiments, we exclude the possibi… ▽ More In this paper we analyze the classification performance of neural network structures without parametric inference. Making use of neural architecture search, we empirically demonstrate that it is possible to find random weight architectures, a deep prior, that enables a linear classification to perform on par with fully trained deep counterparts. Through ablation experiments, we exclude the possibility of winning a weight initialization lottery and confirm that suitable deep priors do not require additional inference. In an extension to continual learning, we investigate the possibility of catastrophic interference free incremental learning. Under the assumption of classes originating from the same data distribution, a deep prior found on only a subset of classes is shown to allow discrimination of further classes through training of a simple linear classifier. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: Accepted for publication at CVPR-W 2021, Workshop on Continual Learning in Computer Vision (CLVision). First two authors have equal contribution

arXiv:2103.13321 [pdf]

Network Reconfiguration Impact on Renewable Energy System and Energy Storage System in Day-Ahead Scheduling

Authors: Arun Venkatesh Ramesh, Xingpeng Li

Abstract: Renewable energy sources (RES) has gained significant interest in recent years. However, due to favourable weather conditions, the RES is installed in remote locations with limited transmission capacity. As a result, it can lead to major curtailments of the free resource when the network is congested. Therefore, energy storage system (ESS) is considered as a viable solution to store energy and add… ▽ More Renewable energy sources (RES) has gained significant interest in recent years. However, due to favourable weather conditions, the RES is installed in remote locations with limited transmission capacity. As a result, it can lead to major curtailments of the free resource when the network is congested. Therefore, energy storage system (ESS) is considered as a viable solution to store energy and address the intermittent nature of RES though ESS is often distributed and may not be geographically close to RES. Therefore, ESS may also suffer from limited transmission capacity due to network congestion. Currently, grid operators overlook network flexibility as a congestion management tool in day-ahead scheduling. This paper addresses these issues and studies the benefits of introducing network reconfiguration (NR) as a preventive and corrective action for transmission flexibility in day-ahead stochastic security-constrained unit-commitment (SSCUC-PC) while considering a multi-scenario RES output. Simulation results demonstrate that NR can lower total system cost, reduce RES curtailments and utilize ESS for better impact by alleviating network congestion in both base-case and post-contingency networks. △ Less

Submitted 11 January, 2021; originally announced March 2021.

arXiv:2009.01797 [pdf, other]

A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning

Authors: Martin Mundt, Yongwon Hong, Iuliia Pliushch, Visvanathan Ramesh

Abstract: Current deep learning methods are regarded as favorable if they empirically perform well on dedicated test sets. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving data is investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten. However, comparison of individua… ▽ More Current deep learning methods are regarded as favorable if they empirically perform well on dedicated test sets. This mentality is seamlessly reflected in the resurfacing area of continual learning, where consecutively arriving data is investigated. The core challenge is framed as protecting previously acquired representations from being catastrophically forgotten. However, comparison of individual methods is nevertheless performed in isolation from the real world by monitoring accumulated benchmark test set performance. The closed world assumption remains predominant, i.e. models are evaluated on data that is guaranteed to originate from the same distribution as used for training. This poses a massive challenge as neural networks are well known to provide overconfident false predictions on unknown and corrupted instances. In this work we critically survey the literature and argue that notable lessons from open set recognition, identifying unknown examples outside of the observed set, and the adjacent field of active learning, querying data to maximize the expected performance gain, are frequently overlooked in the deep learning era. Hence, we propose a consolidated view to bridge continual learning, active learning and open set recognition in deep neural networks. Finally, the established synergies are supported empirically, showing joint improvement in alleviating catastrophic forgetting, querying data, selecting task orders, while exhibiting robust open world application. △ Less

Submitted 23 January, 2023; v1 submitted 3 September, 2020; originally announced September 2020.

Comments: Accepted for publication at Neural Networks in open-access form. Final version available at: https://doi.org/10.1016/j.neunet.2023.01.014

arXiv:2007.10142 [pdf]

An Accelerated-Decomposition Approach for Security-Constrained Unit Commitment with Corrective Network Reconfiguration- Part II: Results and Discussion

Authors: Arun Venkatesh Ramesh, Xingpeng Li, Kory W. Hedman

Abstract: This paper presents a novel approach to handle the computational complexity in security-constrained unit commitment (SCUC) with corrective network reconfiguration (CNR) to harness the flexibility in transmission networks. This is achieved with consideration of scalability through decomposing the SCUC/SCUC-CNR formulation and then fast screening non-critical sub-problems. This is compared against t… ▽ More This paper presents a novel approach to handle the computational complexity in security-constrained unit commitment (SCUC) with corrective network reconfiguration (CNR) to harness the flexibility in transmission networks. This is achieved with consideration of scalability through decomposing the SCUC/SCUC-CNR formulation and then fast screening non-critical sub-problems. This is compared against the extensive formulations of SCUC and SCUC-CNR to show the advantages of the proposed typical-decomposition and accelerated-decomposition approaches to SCUC and SCUC-CNR respectively. Simulation results on the IEEE 24-bus system show that the proposed methods are substantially faster without the loss in solution quality. The proposed accelerated-decomposition approaches can be implemented for large power systems as they have great performance in the scalability tests on the IEEE 73-bus system and the Polish system when compared against the respective extensive formulations and typical-decomposition approaches. Overall, a dynamic post-contingency network can substantially alleviate network congestion and lead to a lower optimal cost. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: 8 pages, 9 figures. arXiv admin note: text overlap with arXiv:1912.01764

Showing 1–50 of 66 results for author: Ramesh, V