-
Enhancing learning in spiking neural networks through neuronal heterogeneity and neuromodulatory signaling
Authors:
Alejandro Rodriguez-Garcia,
Jie Mei,
Srikanth Ramaswamy
Abstract:
Recent progress in artificial intelligence (AI) has been driven by insights from neuroscience, particularly with the development of artificial neural networks (ANNs). This has significantly enhanced the replication of complex cognitive tasks such as vision and natural language processing. Despite these advances, ANNs struggle with continual learning, adaptable knowledge transfer, robustness, and r…
▽ More
Recent progress in artificial intelligence (AI) has been driven by insights from neuroscience, particularly with the development of artificial neural networks (ANNs). This has significantly enhanced the replication of complex cognitive tasks such as vision and natural language processing. Despite these advances, ANNs struggle with continual learning, adaptable knowledge transfer, robustness, and resource efficiency - capabilities that biological systems handle seamlessly. Specifically, ANNs often overlook the functional and morphological diversity of the brain, hindering their computational capabilities. Furthermore, incorporating cell-type specific neuromodulatory effects into ANNs with neuronal heterogeneity could enable learning at two spatial scales: spiking behavior at the neuronal level, and synaptic plasticity at the circuit level, thereby potentially enhancing their learning abilities. In this article, we summarize recent bio-inspired models, learning rules and architectures and propose a biologically-informed framework for enhancing ANNs. Our proposed dual-framework approach highlights the potential of spiking neural networks (SNNs) for emulating diverse spiking behaviors and dendritic compartments to simulate morphological and functional diversity of neuronal computations. Finally, we outline how the proposed approach integrates brain-inspired compartmental models and task-driven SNNs, balances bioinspiration and complexity, and provides scalable solutions for pressing AI challenges, such as continual learning, adaptability, robustness, and resource-efficiency.
△ Less
Submitted 11 November, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR
Authors:
Shashi Kumar,
Srikanth Madikeri,
Juan Zuluaga-Gomez,
Iuliia Thorbecke,
Esaú Villatoro-Tello,
Sergio Burdisso,
Petr Motlicek,
Karthik Pandia,
Aravind Ganapathiraju
Abstract:
In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achie…
▽ More
In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achieved by integrating task-specific tokens into the reference text during ASR model training, streamlining the inference and eliminating the need for separate NLP models. In addition to ASR, we conduct experiments on 3 different tasks: speaker change detection, endpointing, and NER. Our experiments on a public and a private dataset show that the proposed method improves ASR by up to 7.7% in relative WER while outperforming the cascaded pipeline approach in individual task performance. Our code is publicly available: https://github.com/idiap/tokenverse-unifying-speech-nlp
△ Less
Submitted 8 October, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
Authors:
Shashi Kumar,
Srikanth Madikeri,
Juan Zuluaga-Gomez,
Esaú Villatoro-Tello,
Iuliia Thorbecke,
Petr Motlicek,
Manjunath K E,
Aravind Ganapathiraju
Abstract:
Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are trained with full attention context. In this paper, we introduce XLSR-Transducer, where the XLSR-53 model is used as encoder in transducer setup. Our exper…
▽ More
Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are trained with full attention context. In this paper, we introduce XLSR-Transducer, where the XLSR-53 model is used as encoder in transducer setup. Our experiments on the AMI dataset reveal that the XLSR-Transducer achieves 4% absolute WER improvement over Whisper large-v2 and 8% over a Zipformer transducer model trained from scratch. To enable streaming capabilities, we investigate different attention masking patterns in the self-attention computation of transformer layers within the XLSR-53 model. We validate XLSR-Transducer on AMI and 5 languages from CommonVoice under low-resource scenarios. Finally, with the introduction of attention sinks, we reduce the left context by half while achieving a relative 12% improvement in WER.
△ Less
Submitted 8 October, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Simulations of cluster ultra-diffuse galaxies in MOND
Authors:
Srikanth T. Nagesh,
Jonathan Freundlich,
Benoit Famaey,
Michal Bílek,
Graeme Candlish,
Rodrigo Ibata,
Oliver Müller
Abstract:
Ultra-diffuse galaxies (UDGs) in the Coma cluster have velocity dispersion profiles that are in full agreement with the predictions of Modified Newtonian Dynamics (MOND) in isolation. However, the external field effect (EFE) from the cluster seriously deteriorates this agreement. It has been suggested that this could be related to the fact that UDGs are out-of-equilibrium objects whose stars have…
▽ More
Ultra-diffuse galaxies (UDGs) in the Coma cluster have velocity dispersion profiles that are in full agreement with the predictions of Modified Newtonian Dynamics (MOND) in isolation. However, the external field effect (EFE) from the cluster seriously deteriorates this agreement. It has been suggested that this could be related to the fact that UDGs are out-of-equilibrium objects whose stars have been heated by the cluster tides or that they recently fell onto the cluster on radial orbits, such that their velocity dispersion may not reflect the EFE at their instantaneous distance from the cluster center. Here, we simulate UDGs within the Coma cluster in MOND, using the Phantom of Ramses (\textsc{por}) code, and show that if UDGs are initially at equilibrium within the cluster, tides are not sufficient to increase their velocity dispersions to values as high as the observed ones. On the other hand, if they are on a first radial infall onto the cluster, they can keep high velocity dispersions without being destroyed until their first pericentric passage. We conclude that, without alterations such as a screening of the EFE in galaxy clusters or much higher baryonic masses than currently estimated, in the MOND context UDGs must be out-of-equilibrium objects on their first infall onto the cluster.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
Authors:
Mehant Kammakomati,
Sameer Pimparkhede,
Srikanth Tamilselvam,
Prince Kumar,
Pushpak Bhattacharyya
Abstract:
Recent work shows Large Language Models (LLMs) struggle to understand natural language constraints for various text generation tasks in zero- and few-shot settings. While, in the code domain, there is wide usage of constraints in code format to maintain the integrity of code written in Domain-Specific Languages (DSLs) like JSON and YAML which are widely used for system-level programming tasks in e…
▽ More
Recent work shows Large Language Models (LLMs) struggle to understand natural language constraints for various text generation tasks in zero- and few-shot settings. While, in the code domain, there is wide usage of constraints in code format to maintain the integrity of code written in Domain-Specific Languages (DSLs) like JSON and YAML which are widely used for system-level programming tasks in enterprises. Given that LLMs are increasingly used for system-level code tasks, evaluating if they can comprehend these code constraints is crucial. However, no work has been done to evaluate their controllability over code constraints. Hence, we introduce ConCodeEval, a first-of-its-kind benchmark having two novel tasks for code constraints across five representations. Our findings suggest that language models struggle with code constraints. Code languages that perform excellently for normal code tasks do not perform well when the same languages represent fine-grained constraints.
△ Less
Submitted 24 March, 2025; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Sequential Editing for Lifelong Training of Speech Recognition Models
Authors:
Devang Kulshreshtha,
Saket Dingliwal,
Brady Houston,
Nikolaos Pappas,
Srikanth Ronanki
Abstract:
Automatic Speech Recognition (ASR) traditionally assumes known domains, but adding data from a new domain raises concerns about computational inefficiencies linked to retraining models on both existing and new domains. Fine-tuning solely on new domain risks Catastrophic Forgetting (CF). To address this, Lifelong Learning (LLL) algorithms have been proposed for ASR. Prior research has explored tech…
▽ More
Automatic Speech Recognition (ASR) traditionally assumes known domains, but adding data from a new domain raises concerns about computational inefficiencies linked to retraining models on both existing and new domains. Fine-tuning solely on new domain risks Catastrophic Forgetting (CF). To address this, Lifelong Learning (LLL) algorithms have been proposed for ASR. Prior research has explored techniques such as Elastic Weight Consolidation, Knowledge Distillation, and Replay, all of which necessitate either additional parameters or access to prior domain data. We propose Sequential Model Editing as a novel method to continually learn new domains in ASR systems. Different than previous methods, our approach does not necessitate access to prior datasets or the introduction of extra parameters. Our study demonstrates up to 15% Word Error Rate Reduction (WERR) over fine-tuning baseline, and superior efficiency over other LLL techniques on CommonVoice English multi-accent dataset.
△ Less
Submitted 18 September, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
DocCGen: Document-based Controlled Code Generation
Authors:
Sameer Pimparkhede,
Mehant Kammakomati,
Srikanth Tamilselvam,
Prince Kumar,
Ashok Pon Kumar,
Pushpak Bhattacharyya
Abstract:
Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical usage for structured domain-specific languages (DSLs) such as YAML, JSON is limited due to domain-specific schema, grammar, and customizations generally unseen by…
▽ More
Recent developments show that Large Language Models (LLMs) produce state-of-the-art performance on natural language (NL) to code generation for resource-rich general-purpose languages like C++, Java, and Python. However, their practical usage for structured domain-specific languages (DSLs) such as YAML, JSON is limited due to domain-specific schema, grammar, and customizations generally unseen by LLMs during pre-training. Efforts have been made to mitigate this challenge via in-context learning through relevant examples or by fine-tuning. However, it suffers from problems, such as limited DSL samples and prompt sensitivity but enterprises maintain good documentation of the DSLs. Therefore, we propose DocCGen, a framework that can leverage such rich knowledge by breaking the NL-to-Code generation task for structured code languages into a two-step process. First, it detects the correct libraries using the library documentation that best matches the NL query. Then, it utilizes schema rules extracted from the documentation of these libraries to constrain the decoding. We evaluate our framework for two complex structured languages, Ansible YAML and Bash command, consisting of two settings: Out-of-domain (OOD) and In-domain (ID). Our extensive experiments show that DocCGen consistently improves different-sized language models across all six evaluation metrics, reducing syntactic and semantic errors in structured code. We plan to open-source the datasets and code to motivate research in constrained code generation.
△ Less
Submitted 3 July, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
On Improving Error Resilience of Neural End-to-End Speech Coders
Authors:
Kishan Gupta,
Nicola Pia,
Srikanth Korse,
Andreas Brendel,
Guillaume Fuchs,
Markus Multrus
Abstract:
Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently delayed and lost. In recent times, end-to-end neural speech codecs have seen a significant rise, due to their ability to transmit speech signal at low bitrates bu…
▽ More
Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently delayed and lost. In recent times, end-to-end neural speech codecs have seen a significant rise, due to their ability to transmit speech signal at low bitrates but few considerations were made about their error resilience in a real system. Recently introduced Neural End-to-End Speech Codec (NESC) can reproduce high quality natural speech at low bitrates. We extend its robustness to packet losses by adding a low complexity network to predict the codebook indices in latent space. Furthermore, we propose a method to add an in-band FEC at an additional bitrate of 0.8 kbps. Both subjective and objective assessment indicate the effectiveness of proposed methods, and demonstrate that coupling PLC and FEC provide significant robustness against packet losses.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Reinforcement Learning Based Escape Route Generation in Low Visibility Environments
Authors:
Hari Srikanth
Abstract:
Structure fires are responsible for the majority of fire-related deaths nationwide. In order to assist with the rapid evacuation of trapped people, this paper proposes the use of a system that determines optimal search paths for firefighters and exit paths for civilians in real time based on environmental measurements. Through the use of a LiDAR mapping system evaluated and verified by a trust ran…
▽ More
Structure fires are responsible for the majority of fire-related deaths nationwide. In order to assist with the rapid evacuation of trapped people, this paper proposes the use of a system that determines optimal search paths for firefighters and exit paths for civilians in real time based on environmental measurements. Through the use of a LiDAR mapping system evaluated and verified by a trust range derived from sonar and smoke concentration data, a proposed solution to low visibility mapping is tested. These independent point clouds are then used to create distinct maps, which are merged through the use of a RANSAC based alignment methodology and simplified into a visibility graph. Temperature and humidity data are then used to label each node with a danger score, creating an environment tensor. After demonstrating how a Linear Function Approximation based Natural Policy Gradient RL methodology outperforms more complex competitors with respect to robustness and speed, this paper outlines two systems (savior and refugee) that process the environment tensor to create safe rescue and escape routes, respectively.
△ Less
Submitted 27 May, 2024;
originally announced June 2024.
-
Large language models for generating rules, yay or nay?
Authors:
Shangeetha Sivasothy,
Scott Barnett,
Rena Logothetis,
Mohamed Abdelrazek,
Zafaryab Rasool,
Srikanth Thudumu,
Zac Brannelly
Abstract:
Engineering safety-critical systems such as medical devices and digital health intervention systems is complex, where long-term engagement with subject-matter experts (SMEs) is needed to capture the systems' expected behaviour. In this paper, we present a novel approach that leverages Large Language Models (LLMs), such as GPT-3.5 and GPT-4, as a potential world model to accelerate the engineering…
▽ More
Engineering safety-critical systems such as medical devices and digital health intervention systems is complex, where long-term engagement with subject-matter experts (SMEs) is needed to capture the systems' expected behaviour. In this paper, we present a novel approach that leverages Large Language Models (LLMs), such as GPT-3.5 and GPT-4, as a potential world model to accelerate the engineering of software systems. This approach involves using LLMs to generate logic rules, which can then be reviewed and informed by SMEs before deployment. We evaluate our approach using a medical rule set, created from the pandemic intervention monitoring system in collaboration with medical professionals during COVID-19. Our experiments show that 1) LLMs have a world model that bootstraps implementation, 2) LLMs generated less number of rules compared to experts, and 3) LLMs do not have the capacity to generate thresholds for each rule. Our work shows how LLMs augment the requirements' elicitation process by providing access to a world model for domains.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
CASE: Efficient Curricular Data Pre-training for Building Assistive Psychology Expert Models
Authors:
Sarthak Harne,
Monjoy Narayan Choudhury,
Madhav Rao,
TK Srikanth,
Seema Mehrotra,
Apoorva Vashisht,
Aarushi Basu,
Manjit Sodhi
Abstract:
The limited availability of psychologists necessitates efficient identification of individuals requiring urgent mental healthcare. This study explores the use of Natural Language Processing (NLP) pipelines to analyze text data from online mental health forums used for consultations. By analyzing forum posts, these pipelines can flag users who may require immediate professional attention. A crucial…
▽ More
The limited availability of psychologists necessitates efficient identification of individuals requiring urgent mental healthcare. This study explores the use of Natural Language Processing (NLP) pipelines to analyze text data from online mental health forums used for consultations. By analyzing forum posts, these pipelines can flag users who may require immediate professional attention. A crucial challenge in this domain is data privacy and scarcity. To address this, we propose utilizing readily available curricular texts used in institutes specializing in mental health for pre-training the NLP pipelines. This helps us mimic the training process of a psychologist. Our work presents CASE-BERT that flags potential mental health disorders based on forum text. CASE-BERT demonstrates superior performance compared to existing methods, achieving an f1 score of 0.91 for Depression and 0.88 for Anxiety, two of the most commonly reported mental health disorders. Our code and data are publicly available.
△ Less
Submitted 2 October, 2024; v1 submitted 1 June, 2024;
originally announced June 2024.
-
Linear Function Approximation as a Computationally Efficient Method to Solve Classical Reinforcement Learning Challenges
Authors:
Hari Srikanth
Abstract:
Neural Network based approximations of the Value function make up the core of leading Policy Based methods such as Trust Regional Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). While this adds significant value when dealing with very complex environments, we note that in sufficiently low State and action space environments, a computationally expensive Neural Network architectur…
▽ More
Neural Network based approximations of the Value function make up the core of leading Policy Based methods such as Trust Regional Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). While this adds significant value when dealing with very complex environments, we note that in sufficiently low State and action space environments, a computationally expensive Neural Network architecture offers marginal improvement over simpler Value approximation methods. We present an implementation of Natural Actor Critic algorithms with actor updates through Natural Policy Gradient methods. This paper proposes that Natural Policy Gradient (NPG) methods with Linear Function Approximation as a paradigm for value approximation may surpass the performance and speed of Neural Network based models such as TRPO and PPO within these environments. Over Reinforcement Learning benchmarks Cart Pole and Acrobot, we observe that our algorithm trains much faster than complex neural network architectures, and obtains an equivalent or greater result. This allows us to recommend the use of NPG methods with Linear Function Approximation over TRPO and PPO for both traditional and sparse reward low dimensional problems.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Authors:
Mahsa Khoshnoodi,
Vinija Jain,
Mingye Gao,
Malavika Srikanth,
Aman Chadha
Abstract:
Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey…
▽ More
Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey of accelerated generation techniques in autoregressive language models, aiming to understand the state-of-the-art methods and their applications. We categorize these techniques into several key areas: speculative decoding, early exiting mechanisms, and non-autoregressive methods. We discuss each category's underlying principles, advantages, limitations, and recent advancements. Through this survey, we aim to offer insights into the current landscape of techniques in LLMs and provide guidance for future research directions in this critical area of natural language processing.
△ Less
Submitted 24 May, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Authors:
Raghuveer Peri,
Sai Muralidhar Jayanthi,
Srikanth Ronanki,
Anshu Bhatia,
Karel Mundnich,
Saket Dingliwal,
Nilaksh Das,
Zejiang Hou,
Goeric Huybrechts,
Srikanth Vishnubhotla,
Daniel Garcia-Romero,
Sundararajan Srinivasan,
Kyu J Han,
Katrin Kirchhoff
Abstract:
Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we…
▽ More
Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we design algorithms that can generate adversarial examples to jailbreak SLMs in both white-box and black-box attack settings without human involvement. Additionally, we propose countermeasures to thwart such jailbreaking attacks. Our models, trained on dialog data with speech instructions, achieve state-of-the-art performance on spoken question-answering task, scoring over 80% on both safety and helpfulness metrics. Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories. However, we demonstrate that our proposed countermeasures reduce the attack success significantly.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
SpeechVerse: A Large-scale Generalizable Audio Language Model
Authors:
Nilaksh Das,
Saket Dingliwal,
Srikanth Ronanki,
Rohit Paturi,
Zhaocheng Huang,
Prashant Mathur,
Jie Yuan,
Dhanush Bekal,
Xing Niu,
Sai Muralidhar Jayanthi,
Xilai Li,
Karel Mundnich,
Monica Sunkara,
Sravan Bodapati,
Sundararajan Srinivasan,
Kyu J Han,
Katrin Kirchhoff
Abstract:
Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel…
▽ More
Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while keeping the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks.
△ Less
Submitted 24 March, 2025; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Untangling individual cation roles in rock salt high-entropy oxides
Authors:
Saeed S. I. Almishal,
Jacob T. Sivak,
George N. Kotsonis,
Yueze Tan,
Matthew Furst,
Dhiya Srikanth,
Vincent H. Crespi,
Venkatraman Gopalan,
John T. Heron,
Long-Qing Chen,
Christina M. Rost,
Susan B. Sinnott,
Jon-Paul Maria
Abstract:
We unravel the distinct roles each cation plays in phase evolution, stability, and properties within Mg1/5Co1/5Ni1/5Cu1/5Zn1/5O high-entropy oxide (HEO) by integrating experimental findings, thermodynamic analyses, and first-principles predictions. Our approach is through sequentially removing one cation at a time from the five-component high-entropy oxide to create five four-component derivatives…
▽ More
We unravel the distinct roles each cation plays in phase evolution, stability, and properties within Mg1/5Co1/5Ni1/5Cu1/5Zn1/5O high-entropy oxide (HEO) by integrating experimental findings, thermodynamic analyses, and first-principles predictions. Our approach is through sequentially removing one cation at a time from the five-component high-entropy oxide to create five four-component derivatives. Bulk synthesis experiments indicate that Mg, Ni, and Co act as rock salt phase stabilizers whereas only Mg and Ni enthalpically enhance single-phase rock salt stability in thin film growth; synthesis conditions dictate whether Co is a rock salt phase stabilizer or destabilizer. By examining the competing phases and oxidation state preferences using pseudo-binary phase diagrams and first-principles calculations, we resolve the stability differences between bulk and thin film for all compositions. We systematically explore HEO macroscopic property sensitivity to cation selection employing both predicted and measured optical spectra. This study establishes a framework for understanding high-entropy oxide synthesizability and properties on a per-cation basis that is broadly applicable to tailoring functional property design in other high-entropy materials.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
DisBeaNet: A Deep Neural Network to augment Unmanned Surface Vessels for maritime situational awareness
Authors:
Srikanth Vemula,
Eulises Franco,
Michael Frye
Abstract:
Intelligent detection and tracking of the vessels on the sea play a significant role in conducting traffic avoidance in unmanned surface vessels(USV). Current traffic avoidance software relies mainly on Automated Identification System (AIS) and radar to track other vessels to avoid collisions and acts as a typical perception system to detect targets. However, in a contested environment, emitting r…
▽ More
Intelligent detection and tracking of the vessels on the sea play a significant role in conducting traffic avoidance in unmanned surface vessels(USV). Current traffic avoidance software relies mainly on Automated Identification System (AIS) and radar to track other vessels to avoid collisions and acts as a typical perception system to detect targets. However, in a contested environment, emitting radar energy also presents the vulnerability to detection by adversaries. Deactivating these Radiofrequency transmitting sources will increase the threat of detection and degrade the USV's ability to monitor shipping traffic in the vicinity. Therefore, an intelligent visual perception system based on an onboard camera with passive sensing capabilities that aims to assist USV in addressing this problem is presented in this paper. This paper will present a novel low-cost vision perception system for detecting and tracking vessels in the maritime environment. This novel low-cost vision perception system is introduced using the deep learning framework. A neural network, DisBeaNet, can detect vessels, track, and estimate the vessel's distance and bearing from the monocular camera. The outputs obtained from this neural network are used to determine the latitude and longitude of the identified vessel.
△ Less
Submitted 17 May, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Computational complexity and quantum interpretations
Authors:
Vivek Kumar,
M. P. Singh,
R. Srikanth
Abstract:
In computational complexity theory, it remains to be understood whether $\textbf{BQP}$ is the same as $\textbf{BPP}$. Prima facie, one would expect that this mathematical question is quite unrelated to the foundational question of whether the quantum state is an element of reality or of the observer's knowledge. By contrast, here we argue that the complexity of computation in a physical theory may…
▽ More
In computational complexity theory, it remains to be understood whether $\textbf{BQP}$ is the same as $\textbf{BPP}$. Prima facie, one would expect that this mathematical question is quite unrelated to the foundational question of whether the quantum state is an element of reality or of the observer's knowledge. By contrast, here we argue that the complexity of computation in a physical theory may constrain its physical interpretation. Specifically in the quantum case, we argue that a subjective interpretation of the quantum mechanics favors the proposition $\textbf{BQP} = \textbf{BPP}$. Therefore, if $\textbf{BPP} \subset \textbf{BQP}$, then a realist interpretation of quantum mechanics would be favored.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
COPAL: Continual Pruning in Large Language Generative Models
Authors:
Srikanth Malla,
Joon Hee Choi,
Chiho Choi
Abstract:
Adapting pre-trained large language models to different domains in natural language processing requires two key considerations: high computational demands and model's inability to continual adaptation. To simultaneously address both issues, this paper presents COPAL (COntinual Pruning in Adaptive Language settings), an algorithm developed for pruning large language generative models under a contin…
▽ More
Adapting pre-trained large language models to different domains in natural language processing requires two key considerations: high computational demands and model's inability to continual adaptation. To simultaneously address both issues, this paper presents COPAL (COntinual Pruning in Adaptive Language settings), an algorithm developed for pruning large language generative models under a continual model adaptation setting. While avoiding resource-heavy finetuning or retraining, our pruning process is guided by the proposed sensitivity analysis. The sensitivity effectively measures model's ability to withstand perturbations introduced by the new dataset and finds model's weights that are relevant for all encountered datasets. As a result, COPAL allows seamless model adaptation to new domains while enhancing the resource efficiency. Our empirical evaluation on a various size of LLMs show that COPAL outperforms baseline models, demonstrating its efficacy in efficiency and adaptability.
△ Less
Submitted 14 June, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Efficient Algorithms for Earliest and Fastest Paths in Public Transport Networks
Authors:
Mithinti Srikanth,
G. Ramakrishna
Abstract:
Public transport administrators rely on efficient algorithms for various problems that arise in public transport networks. In particular, our study focused on designing linear-time algorithms for two fundamental path problems: the earliest arrival time (\textsc{eat}) and the fastest path duration (\textsc{fpd}) on public transportation data. We conduct a comparative analysis with state-of-the-art…
▽ More
Public transport administrators rely on efficient algorithms for various problems that arise in public transport networks. In particular, our study focused on designing linear-time algorithms for two fundamental path problems: the earliest arrival time (\textsc{eat}) and the fastest path duration (\textsc{fpd}) on public transportation data. We conduct a comparative analysis with state-of-the-art algorithms. The results are quite promising, indicating substantial efficiency improvements. Specifically, the fastest path problem shows a remarkable 34-fold speedup, while the earliest arrival time problem exhibits an even more impressive 183-fold speedup. These findings highlight the effectiveness of our algorithms to solve \textsc{eat} and \textsc{fpd} problems in public transport, and eventually help public administrators to enrich the urban transport experience.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
NeuroKoopman Dynamic Causal Discovery
Authors:
Rahmat Adesunkanmi,
Balaji Sesha Srikanth Pokuri,
Ratnesh Kumar
Abstract:
In many real-world applications where the system dynamics has an underlying interdependency among its variables (such as power grid, economics, neuroscience, omics networks, environmental ecosystems, and others), one is often interested in knowing whether the past values of one time series influences the future of another, known as Granger causality, and the associated underlying dynamics. This pa…
▽ More
In many real-world applications where the system dynamics has an underlying interdependency among its variables (such as power grid, economics, neuroscience, omics networks, environmental ecosystems, and others), one is often interested in knowing whether the past values of one time series influences the future of another, known as Granger causality, and the associated underlying dynamics. This paper introduces a Koopman-inspired framework that leverages neural networks for data-driven learning of the Koopman bases, termed NeuroKoopman Dynamic Causal Discovery (NKDCD), for reliably inferring the Granger causality along with the underlying nonlinear dynamics. NKDCD employs an autoencoder architecture that lifts the nonlinear dynamics to a higher dimension using data-learned bases, where the lifted time series can be reliably modeled linearly. The lifting function, the linear Granger causality lag matrices, and the projection function (from lifted space to base space) are all represented as multilayer perceptrons and are all learned simultaneously in one go. NKDCD also utilizes sparsity-inducing penalties on the weights of the lag matrices, encouraging the model to select only the needed causal dependencies within the data. Through extensive testing on practically applicable datasets, it is shown that the NKDCD outperforms the existing nonlinear Granger causality discovery approaches.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Locally dualisable modular representations and local regularity
Authors:
Dave Benson,
Srikanth B. Iyengar,
Henning Krause,
Julia Pevtsova
Abstract:
This work concerns the stable module category of a finite group over a field of characteristic dividing the group order. The minimal localising tensor ideals correspond to the non-maximal homogeneous prime ideals in the cohomology ring of the group. Given such a prime ideal, a number of characterisations of the dualisable objects in the corresponding tensor ideal are given. One characterisation of…
▽ More
This work concerns the stable module category of a finite group over a field of characteristic dividing the group order. The minimal localising tensor ideals correspond to the non-maximal homogeneous prime ideals in the cohomology ring of the group. Given such a prime ideal, a number of characterisations of the dualisable objects in the corresponding tensor ideal are given. One characterisation of interest is that they are exactly the modules whose restriction along a corresponding $π$-point are finite dimensional plus projective. A key insight is the identification of a special property of the stable module category that controls the cohomological behaviour of local dualisable objects. This property, introduced in this work for general triangulated categories and called local regularity, is related to strong generation. A major part of the paper is devoted to developing this notion and investigating its ramifications for various special classes of objects in tensor triangulated categories.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement
Authors:
Pushkar Shukla,
Dhruv Srikanth,
Lee Cohen,
Matthew Turk
Abstract:
We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. While counterfactuals have been used to analyze and address biases in DNN models, the counterfactuals themselves are often generated from biased generative models, which can introduce additional biases or spurious correlations. To address this issue, we propose using adv…
▽ More
We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. While counterfactuals have been used to analyze and address biases in DNN models, the counterfactuals themselves are often generated from biased generative models, which can introduce additional biases or spurious correlations. To address this issue, we propose using adversarial images, that is images that deceive a deep neural network but not humans, as counterfactuals for fair model training. Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples. By incorporating adversarial images into the training data, we aim to prevent biases from propagating through the pipeline. We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods. Qualitatively, our results indicate that post-training, the decisions made by the model are less dependent on the sensitive attribute and our model better disentangles the relationship between sensitive attributes and classification variables.
△ Less
Submitted 27 June, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
How often are errors in natural language reasoning due to paraphrastic variability?
Authors:
Neha Srikanth,
Marine Carpuat,
Rachel Rudinger
Abstract:
Large language models have been shown to behave inconsistently in response to meaning-preserving paraphrastic inputs. At the same time, researchers evaluate the knowledge and reasoning abilities of these models with test evaluations that do not disaggregate the effect of paraphrastic variability on performance. We propose a metric for evaluating the paraphrastic consistency of natural language rea…
▽ More
Large language models have been shown to behave inconsistently in response to meaning-preserving paraphrastic inputs. At the same time, researchers evaluate the knowledge and reasoning abilities of these models with test evaluations that do not disaggregate the effect of paraphrastic variability on performance. We propose a metric for evaluating the paraphrastic consistency of natural language reasoning models based on the probability of a model achieving the same correctness on two paraphrases of the same problem. We mathematically connect this metric to the proportion of a model's variance in correctness attributable to paraphrasing. To estimate paraphrastic consistency, we collect ParaNLU, a dataset of 7,782 human-written and validated paraphrased reasoning problems constructed on top of existing benchmark datasets for defeasible and abductive natural language inference. Using ParaNLU, we measure the paraphrastic consistency of several model classes and show that consistency dramatically increases with pretraining but not finetuning. All models tested exhibited room for improvement in paraphrastic consistency.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Local Correction of Linear Functions over the Boolean Cube
Authors:
Prashanth Amireddy,
Amik Raj Behera,
Manaswi Paraashar,
Srikanth Srinivasan,
Madhu Sudan
Abstract:
We consider the task of locally correcting, and locally list-correcting, multivariate linear functions over the domain $\{0,1\}^n$ over arbitrary fields and more generally Abelian groups. Such functions form error-correcting codes of relative distance $1/2$ and we give local-correction algorithms correcting up to nearly $1/4$-fraction errors making $\widetilde{\mathcal{O}}(\log n)$ queries. This q…
▽ More
We consider the task of locally correcting, and locally list-correcting, multivariate linear functions over the domain $\{0,1\}^n$ over arbitrary fields and more generally Abelian groups. Such functions form error-correcting codes of relative distance $1/2$ and we give local-correction algorithms correcting up to nearly $1/4$-fraction errors making $\widetilde{\mathcal{O}}(\log n)$ queries. This query complexity is optimal up to $\mathrm{poly}(\log\log n)$ factors. We also give local list-correcting algorithms correcting $(1/2 - \varepsilon)$-fraction errors with $\widetilde{\mathcal{O}}_{\varepsilon}(\log n)$ queries.
These results may be viewed as natural generalizations of the classical work of Goldreich and Levin whose work addresses the special case where the underlying group is $\mathbb{Z}_2$. By extending to the case where the underlying group is, say, the reals, we give the first non-trivial locally correctable codes (LCCs) over the reals (with query complexity being sublinear in the dimension (also known as message length)).
The central challenge in constructing the local corrector is constructing "nearly balanced vectors" over $\{-1,1\}^n$ that span $1^n$ -- we show how to construct $\mathcal{O}(\log n)$ vectors that do so, with entries in each vector summing to $\pm1$. The challenge to the local-list-correction algorithms, given the local corrector, is principally combinatorial, i.e., in proving that the number of linear functions within any Hamming ball of radius $(1/2-\varepsilon)$ is $\mathcal{O}_{\varepsilon}(1)$. Getting this general result covering every Abelian group requires integrating a variety of known methods with some new combinatorial ingredients analyzing the structural properties of codewords that lie within small Hamming balls.
△ Less
Submitted 25 April, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
The State of Lithium-Ion Battery Health Prognostics in the CPS Era
Authors:
Gaurav Shinde,
Rohan Mohapatra,
Pooja Krishan,
Harish Garg,
Srikanth Prabhu,
Sanchari Das,
Mohammad Masum,
Saptarshi Sengupta
Abstract:
Lithium-ion batteries (Li-ion) have revolutionized energy storage technology, becoming integral to our daily lives by powering a diverse range of devices and applications. Their high energy density, fast power response, recyclability, and mobility advantages have made them the preferred choice for numerous sectors. This paper explores the seamless integration of Prognostics and Health Management w…
▽ More
Lithium-ion batteries (Li-ion) have revolutionized energy storage technology, becoming integral to our daily lives by powering a diverse range of devices and applications. Their high energy density, fast power response, recyclability, and mobility advantages have made them the preferred choice for numerous sectors. This paper explores the seamless integration of Prognostics and Health Management within batteries, presenting a multidisciplinary approach that enhances the reliability, safety, and performance of these powerhouses. Remaining useful life (RUL), a critical concept in prognostics, is examined in depth, emphasizing its role in predicting component failure before it occurs. The paper reviews various RUL prediction methods, from traditional models to cutting-edge data-driven techniques. Furthermore, it highlights the paradigm shift toward deep learning architectures within the field of Li-ion battery health prognostics, elucidating the pivotal role of deep learning in addressing battery system complexities. Practical applications of PHM across industries are also explored, offering readers insights into real-world implementations.This paper serves as a comprehensive guide, catering to both researchers and practitioners in the field of Li-ion battery PHM.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Non-existence of Ulrich modules over Cohen-Macaulay local rings
Authors:
Srikanth B. Iyengar,
Linquan Ma,
Mark E. Walker,
Ziquan Zhuang
Abstract:
Over a Cohen-Macaulay local ring, the minimal number of generators of a maximal Cohen-Macaulay module is bounded above by its multiplicity. In 1984 Ulrich asked whether there always exist modules for which equality holds; such modules are known nowadays as Ulrich modules. We answer this question in the negative by constructing families of two dimensional Cohen-Macaulay local rings that have no Ulr…
▽ More
Over a Cohen-Macaulay local ring, the minimal number of generators of a maximal Cohen-Macaulay module is bounded above by its multiplicity. In 1984 Ulrich asked whether there always exist modules for which equality holds; such modules are known nowadays as Ulrich modules. We answer this question in the negative by constructing families of two dimensional Cohen-Macaulay local rings that have no Ulrich modules. Some of these examples are Gorenstein normal domains; others are even complete intersection domains, though not normal.
△ Less
Submitted 12 March, 2025; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation
Authors:
Kasi Viswanath,
Peng Jiang,
Srikanth Saripalli
Abstract:
LiDAR semantic segmentation frameworks predominantly use geometry-based features to differentiate objects within a scan. Although these methods excel in scenarios with clear boundaries and distinct shapes, their performance declines in environments where boundaries are indistinct, particularly in off-road contexts. To address this issue, recent advances in 3D segmentation algorithms have aimed to…
▽ More
LiDAR semantic segmentation frameworks predominantly use geometry-based features to differentiate objects within a scan. Although these methods excel in scenarios with clear boundaries and distinct shapes, their performance declines in environments where boundaries are indistinct, particularly in off-road contexts. To address this issue, recent advances in 3D segmentation algorithms have aimed to leverage raw LiDAR intensity readings to improve prediction precision. However, despite these advances, existing learning-based models face challenges in linking the complex interactions between raw intensity and variables such as distance, incidence angle, material reflectivity, and atmospheric conditions. Building upon our previous work, this paper explores the advantages of employing calibrated intensity (also referred to as reflectivity) within learning-based LiDAR semantic segmentation frameworks. We start by demonstrating that adding reflectivity as input enhances the LiDAR semantic segmentation model by providing a better data representation. Extensive experimentation with the Rellis-3d off-road dataset shows that replacing intensity with reflectivity results in a 4\% improvement in mean Intersection over Union (mIoU) for off-road scenarios. We demonstrate the potential benefits of using calibrated intensity for semantic segmentation in urban environments (SemanticKITTI) and for cross-sensor domain adaptation. Additionally, we tested the Segment Anything Model (SAM) using reflectivity as input, resulting in improved segmentation masks for LiDAR images.
△ Less
Submitted 30 September, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization
Authors:
Peng Jiang,
Gaurav Pandey,
Srikanth Saripalli
Abstract:
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate.…
▽ More
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Read between the lines -- Functionality Extraction From READMEs
Authors:
Prince Kumar,
Srikanth Tamilselvam,
Dinesh Garg
Abstract:
While text summarization is a well-known NLP task, in this paper, we introduce a novel and useful variant of it called functionality extraction from Git README files. Though this task is a text2text generation at an abstract level, it involves its own peculiarities and challenges making existing text2text generation systems not very useful. The motivation behind this task stems from a recent surge…
▽ More
While text summarization is a well-known NLP task, in this paper, we introduce a novel and useful variant of it called functionality extraction from Git README files. Though this task is a text2text generation at an abstract level, it involves its own peculiarities and challenges making existing text2text generation systems not very useful. The motivation behind this task stems from a recent surge in research and development activities around the use of large language models for code-related tasks, such as code refactoring, code summarization, etc. We also release a human-annotated dataset called FuncRead, and develop a battery of models for the task. Our exhaustive experimentation shows that small size fine-tuned models beat any baseline models that can be designed using popular black-box or white-box large language models (LLMs) such as ChatGPT and Bard. Our best fine-tuned 7 Billion CodeLlama model exhibit 70% and 20% gain on the F1 score against ChatGPT and Bard respectively.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Contextuality, superlocality and nonclassicality of supernoncontextuality
Authors:
Chellasamy Jebarathinam,
R. Srikanth
Abstract:
Contextuality is a fundamental manifestation of nonclassicality, indicating that for certain quantum correlations, sets of jointly measurable variables cannot be pre-assigned values independently of the measurement context. In this work, we characterize nonclassical quantum correlation beyond contextuality, in terms of supernoncontextuality, namely the higher-than-quantum hidden-variable(HV) dimen…
▽ More
Contextuality is a fundamental manifestation of nonclassicality, indicating that for certain quantum correlations, sets of jointly measurable variables cannot be pre-assigned values independently of the measurement context. In this work, we characterize nonclassical quantum correlation beyond contextuality, in terms of supernoncontextuality, namely the higher-than-quantum hidden-variable(HV) dimensionality required to reproduce the given noncontextual quantum correlations. Thus supernoncontextuality is the contextuality analogue of superlocality. Specifically, we study the quantum system of two-qubit states in a scenario composed of five contexts that demonstrate contextuality in a state-dependent fashion. For this purpose, we use the framework of boxes, whose behavior is described by a set of probabilities satisfying the no-disturbance conditions. We first demonstrate that while superlocality is necessary to observe a contextual box, superlocality is not sufficient for contextuality. On the other hand, a noncontextual superlocal box can be supernoncontextual, but superlocality is not a necessary condition. We then introduce a notion of nonclassicality beyond the standard contextuality, called semi-device-independent contextuality. We study semi-device-independent contextuality of two-qubit states in the above mentioned scenario and demonstrate how supernoncontextuality implies this nonclassicality. To this end, we propose a criterion and a measure of semi-device-independent contextuality.
△ Less
Submitted 20 November, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Entropic Cohesion in Vitrimers
Authors:
Rahul Karmakar,
Himanshu,
Srikanth Sastry,
Sanat K Kumar,
Tarak K Patra
Abstract:
Vitrimers are polymer networks that can undergo bond exchange reactions. They dynamically rearrange their structures while maintaining their overall integrity, thus resulting in unique properties such as self-healing, reprocessability, shape memory and adaptability. Here, we show that the introduction of dynamic bonds directly impacts the polymer density. For a limiting case, where the dynamic bon…
▽ More
Vitrimers are polymer networks that can undergo bond exchange reactions. They dynamically rearrange their structures while maintaining their overall integrity, thus resulting in unique properties such as self-healing, reprocessability, shape memory and adaptability. Here, we show that the introduction of dynamic bonds directly impacts the polymer density. For a limiting case, where the dynamic bonds are the same size as the polymer chain bonds, simulations and theory show an enhancement in the density, because these bonds induce an increased cohesive force in the liquid, which is entropic in origin. The crosslinks are well mixed in the bulk but are depleted from the air and polymer interface. These findings implicate density as a key variable in polymers with dynamic crosslinkers, one that can be used to facilely tune their properties.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation
Authors:
Bharat Srikishan,
Anika Tabassum,
Srikanth Allu,
Ramakrishnan Kannan,
Nikhil Muralidhar
Abstract:
Deep learning architectures have achieved state-of-the-art (SOTA) performance on computer vision tasks such as object detection and image segmentation. This may be attributed to the use of over-parameterized, monolithic deep learning architectures executed on large datasets. Although such architectures lead to increased accuracy, this is usually accompanied by a large increase in computation and m…
▽ More
Deep learning architectures have achieved state-of-the-art (SOTA) performance on computer vision tasks such as object detection and image segmentation. This may be attributed to the use of over-parameterized, monolithic deep learning architectures executed on large datasets. Although such architectures lead to increased accuracy, this is usually accompanied by a large increase in computation and memory requirements during inference. While this is a non-issue in traditional machine learning pipelines, the recent confluence of machine learning and fields like the Internet of Things has rendered such large architectures infeasible for execution in low-resource settings. In such settings, previous efforts have proposed decision cascades where inputs are passed through models of increasing complexity until desired performance is achieved. However, we argue that cascaded prediction leads to increased computational cost due to wasteful intermediate computations. To address this, we propose PaSeR (Parsimonious Segmentation with Reinforcement Learning) a non-cascading, cost-aware learning pipeline as an alternative to cascaded architectures. Through experimental evaluation on real-world and standard datasets, we demonstrate that PaSeR achieves better accuracy while minimizing computational cost relative to cascaded models. Further, we introduce a new metric IoU/GigaFlop to evaluate the balance between cost and performance. On the real-world task of battery material phase segmentation, PaSeR yields a minimum performance improvement of 174% on the IoU/GigaFlop metric with respect to baselines. We also demonstrate PaSeR's adaptability to complementary models trained on a noisy MNIST dataset, where it achieved a minimum performance improvement on IoU/GigaFlop of 13.4% over SOTA models. Code and data are available at https://github.com/scailab/paser .
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
FLASH: Federated Learning Across Simultaneous Heterogeneities
Authors:
Xiangyu Chang,
Sk Miraj Ahmed,
Srikanth V. Krishnamurthy,
Basak Guler,
Ananthram Swami,
Samet Oymak,
Amit K. Roy-Chowdhury
Abstract:
The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of h…
▽ More
The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of heterogeneity is critical; for instance, low-latency clients may have poor data quality, and vice versa. In this work, we propose FLASH(Federated Learning Across Simultaneous Heterogeneities), a lightweight and flexible client selection algorithm that outperforms state-of-the-art FL frameworks under extensive sources of heterogeneity, by trading-off the statistical information associated with the client's data quality, data distribution, and latency. FLASH is the first method, to our knowledge, for handling all these heterogeneities in a unified manner. To do so, FLASH models the learning dynamics through contextual multi-armed bandits (CMAB) and dynamically selects the most promising clients. Through extensive experiments, we demonstrate that FLASH achieves substantial and consistent improvements over state-of-the-art baselines -- as much as 10% in absolute accuracy -- thanks to its unified approach. Importantly, FLASH also outperforms federated aggregation methods that are designed to handle highly heterogeneous settings and even enjoys a performance boost when integrated with them.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Next-Generation Teleophthalmology: AI-enabled Quality Assessment Aiding Remote Smartphone-based Consultation
Authors:
Dhruv Srikanth,
Jayang Gurung,
N Satya Deepika,
Vineet Joshi,
Lopamudra Giri,
Pravin Vaddavalli,
Soumya Jana
Abstract:
Blindness and other eye diseases are a global health concern, particularly in low- and middle-income countries like India. In this regard, during the COVID-19 pandemic, teleophthalmology became a lifeline, and the Grabi attachment for smartphone-based eye imaging gained in use. However, quality of user-captured image often remained inadequate, requiring clinician vetting and delays. In this backdr…
▽ More
Blindness and other eye diseases are a global health concern, particularly in low- and middle-income countries like India. In this regard, during the COVID-19 pandemic, teleophthalmology became a lifeline, and the Grabi attachment for smartphone-based eye imaging gained in use. However, quality of user-captured image often remained inadequate, requiring clinician vetting and delays. In this backdrop, we propose an AI-based quality assessment system with instant feedback mimicking clinicians' judgments and tested on patient-captured images. Dividing the complex problem hierarchically, here we tackle a nontrivial part, and demonstrate a proof of the concept.
△ Less
Submitted 7 August, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions
Authors:
Hung Du,
Srikanth Thudumu,
Rajesh Vasa,
Kon Mouzakis
Abstract:
Research interest in autonomous agents is on the rise as an emerging topic. The notable achievements of Large Language Models (LLMs) have demonstrated the considerable potential to attain human-like intelligence in autonomous agents. However, the challenge lies in enabling these agents to learn, reason, and navigate uncertainties in dynamic environments. Context awareness emerges as a pivotal elem…
▽ More
Research interest in autonomous agents is on the rise as an emerging topic. The notable achievements of Large Language Models (LLMs) have demonstrated the considerable potential to attain human-like intelligence in autonomous agents. However, the challenge lies in enabling these agents to learn, reason, and navigate uncertainties in dynamic environments. Context awareness emerges as a pivotal element in fortifying multi-agent systems when dealing with dynamic situations. Despite existing research focusing on both context-aware systems and multi-agent systems, there is a lack of comprehensive surveys outlining techniques for integrating context-aware systems with multi-agent systems. To address this gap, this survey provides a comprehensive overview of state-of-the-art context-aware multi-agent systems. First, we outline the properties of both context-aware systems and multi-agent systems that facilitate integration between these systems. Subsequently, we propose a general process for context-aware systems, with each phase of the process encompassing diverse approaches drawn from various application domains such as collision avoidance in autonomous driving, disaster relief management, utility management, supply chain management, human-AI interaction, and others. Finally, we discuss the existing challenges of context-aware multi-agent systems and provide future research directions in this field.
△ Less
Submitted 29 January, 2025; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Tailoring magnetic and hyperthermia properties of biphase iron oxide nanocubes through post-annealing
Authors:
Supun B. Attanayake,
Amit Chanda,
Raja Das,
Manh-Huong Phan,
Hariharan Srikanth
Abstract:
Tailoring the magnetic properties of iron oxide nanosystems is essential to expand their biomedical applications. In this study, the 34 nm iron oxide nanocubes with two phases consisting of Fe3O4 and alpha-Fe2O3 were annealed for 2 hours in the presence of O2, N2, He, and Ar to tune the respective phase volume fractions and control the magnetic properties. X-ray diffraction and magnetic measuremen…
▽ More
Tailoring the magnetic properties of iron oxide nanosystems is essential to expand their biomedical applications. In this study, the 34 nm iron oxide nanocubes with two phases consisting of Fe3O4 and alpha-Fe2O3 were annealed for 2 hours in the presence of O2, N2, He, and Ar to tune the respective phase volume fractions and control the magnetic properties. X-ray diffraction and magnetic measurements were carried out post-treatment to evaluate the changes of the treated samples compared to the as-prepared, which showed an enhancement of the alpha-Fe2O3 phase in the samples annealed with O2, while the others indicated Fe3O4 enhancement. Furthermore, the latter samples indicated enhancements in the crystallinity and saturation magnetization while coercivity enhancement was most significant in the samples annealed with O2, resulting in the highest specific absorption rates (up to 1000 W/g) in all the applied fields of 800, 600, and 400 Oe in agar during magnetic hyperthermia measurements. The general enhancement in the specific absorption rate post-annealing underscores the importance of the annealing atmosphere in the enhancement of the magnetic and structural properties of nanostructures.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
LLMs for Test Input Generation for Semantic Caches
Authors:
Zafaryab Rasool,
Scott Barnett,
David Willie,
Stefanus Kurniawan,
Sherwin Balugo,
Srikanth Thudumu,
Mohamed Abdelrazek
Abstract:
Large language models (LLMs) enable state-of-the-art semantic capabilities to be added to software systems such as semantic search of unstructured documents and text generation. However, these models are computationally expensive. At scale, the cost of serving thousands of users increases massively affecting also user experience. To address this problem, semantic caches are used to check for answe…
▽ More
Large language models (LLMs) enable state-of-the-art semantic capabilities to be added to software systems such as semantic search of unstructured documents and text generation. However, these models are computationally expensive. At scale, the cost of serving thousands of users increases massively affecting also user experience. To address this problem, semantic caches are used to check for answers to similar queries (that may have been phrased differently) without hitting the LLM service. Due to the nature of these semantic cache techniques that rely on query embeddings, there is a high chance of errors impacting user confidence in the system. Adopting semantic cache techniques usually requires testing the effectiveness of a semantic cache (accurate cache hits and misses) which requires a labelled test set of similar queries and responses which is often unavailable. In this paper, we present VaryGen, an approach for using LLMs for test input generation that produces similar questions from unstructured text documents. Our novel approach uses the reasoning capabilities of LLMs to 1) adapt queries to the domain, 2) synthesise subtle variations to queries, and 3) evaluate the synthesised test dataset. We evaluated our approach in the domain of a student question and answer system by qualitatively analysing 100 generated queries and result pairs, and conducting an empirical case study with an open source semantic cache. Our results show that query pairs satisfy human expectations of similarity and our generated data demonstrates failure cases of a semantic cache. Additionally, we also evaluate our approach on Qasper dataset. This work is an important first step into test input generation for semantic applications and presents considerations for practitioners when calibrating a semantic cache.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Seven Failure Points When Engineering a Retrieval Augmented Generation System
Authors:
Scott Barnett,
Stefanus Kurniawan,
Srikanth Thudumu,
Zach Brannelly,
Mohamed Abdelrazek
Abstract:
Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinat…
▽ More
Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinated responses from LLMs, b) link sources/references to generated responses, and c) remove the need for annotating documents with meta-data. However, RAG systems suffer from limitations inherent to information retrieval systems and from reliance on LLMs. In this paper, we present an experience report on the failure points of RAG systems from three case studies from separate domains: research, education, and biomedical. We share the lessons learned and present 7 failure points to consider when designing a RAG system. The two key takeaways arising from our work are: 1) validation of a RAG system is only feasible during operation, and 2) the robustness of a RAG system evolves rather than designed in at the start. We conclude with a list of potential research directions on RAG systems for the software engineering community.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Plug-and-Play Transformer Modules for Test-Time Adaptation
Authors:
Xiangyu Chang,
Sk Miraj Ahmed,
Srikanth V. Krishnamurthy,
Basak Guler,
Ananthram Swami,
Samet Oymak,
Amit K. Roy-Chowdhury
Abstract:
Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate…
▽ More
Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate customized tuned modules for each such domain. Toward addressing these challenges, this work introduces PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains, effectively creating a ``module store''. Given a target domain with few-shot unlabeled data, we introduce an unsupervised test-time adaptation (TTA) method to (1) select a sparse subset of relevant modules from this store and (2) create a weighted combination of selected modules without tuning their weights. This plug-and-play nature enables us to harness multiple most-relevant source domains in a single inference call. Comprehensive evaluations demonstrate that PLUTO uniformly outperforms alternative TTA methods and that selecting $\leq$5 modules suffice to extract most of the benefit. At a high level, our method equips pre-trained transformers with the capability to dynamically adapt to new domains, motivating a new paradigm for efficient and scalable domain adaptation.
△ Less
Submitted 8 February, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Locally dualizable modules abound
Authors:
Jon F. Carlson,
Srikanth B. Iyengar
Abstract:
It is proved that given any prime ideal $\mathfrak{p}$ of height at least 2 in a countable commutative noetherian ring $A$, there are uncountably many more dualizable objects in the $\mathfrak{p}$-local $\mathfrak{p}$-torsion stratum of the derived category of $A$ than those that are obtained as retracts of images of perfect $A$-complexes. An analogous result is established dealing with the stable…
▽ More
It is proved that given any prime ideal $\mathfrak{p}$ of height at least 2 in a countable commutative noetherian ring $A$, there are uncountably many more dualizable objects in the $\mathfrak{p}$-local $\mathfrak{p}$-torsion stratum of the derived category of $A$ than those that are obtained as retracts of images of perfect $A$-complexes. An analogous result is established dealing with the stable module category of the group algebra, over a countable field of positive characteristic $p$, of an elementary abelian $p$-group of rank at least 3.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Off-Road LiDAR Intensity Based Semantic Segmentation
Authors:
Kasi Viswanath,
Peng Jiang,
Sujit PB,
Srikanth Saripalli
Abstract:
LiDAR is used in autonomous driving to provide 3D spatial information and enable accurate perception in off-road environments, aiding in obstacle detection, mapping, and path planning. Learning-based LiDAR semantic segmentation utilizes machine learning techniques to automatically classify objects and regions in LiDAR point clouds. Learning-based models struggle in off-road environments due to the…
▽ More
LiDAR is used in autonomous driving to provide 3D spatial information and enable accurate perception in off-road environments, aiding in obstacle detection, mapping, and path planning. Learning-based LiDAR semantic segmentation utilizes machine learning techniques to automatically classify objects and regions in LiDAR point clouds. Learning-based models struggle in off-road environments due to the presence of diverse objects with varying colors, textures, and undefined boundaries, which can lead to difficulties in accurately classifying and segmenting objects using traditional geometric-based features. In this paper, we address this problem by harnessing the LiDAR intensity parameter to enhance object segmentation in off-road environments. Our approach was evaluated in the RELLIS-3D data set and yielded promising results as a preliminary analysis with improved mIoU for classes "puddle" and "grass" compared to more complex deep learning-based benchmarks. The methodology was evaluated for compatibility across both Velodyne and Ouster LiDAR systems, assuring its cross-platform applicability. This analysis advocates for the incorporation of calibrated intensity as a supplementary input, aiming to enhance the prediction accuracy of learning based semantic segmentation frameworks. https://github.com/MOONLABIISERB/lidar-intensity-predictor/tree/main
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Convolution Neural Network Model Framework to Predict Microscale Drag Force for Turbulent Flow in Porous Media
Authors:
Vishal Srikanth,
Andrey V. Kuznetsov
Abstract:
Convolution Neural Networks (CNN) are well-suited to model the nonlinear relationship between the microscale geometry of porous media and the corresponding flow distribution, thereby accurately and efficiently coupling the flow behavior at the micro- and macro- scale levels. In this paper, we have identified the challenges involved in implementing CNNs for macroscale model closure in the turbulent…
▽ More
Convolution Neural Networks (CNN) are well-suited to model the nonlinear relationship between the microscale geometry of porous media and the corresponding flow distribution, thereby accurately and efficiently coupling the flow behavior at the micro- and macro- scale levels. In this paper, we have identified the challenges involved in implementing CNNs for macroscale model closure in the turbulent flow regime, particularly in the prediction of the drag force components arising from the microscale level. We report that significant error is incurred in the crucial data preparation step when the Reynolds averaged pressure and velocity distributions are interpolated from unstructured stretched grids used for Large Eddy Simulation (LES) to the structured uniform grids used by the CNN model. We show that the range of the microscale velocity values is 10 times larger than the range of the pressure values. This invalidates the use of the mean squared error loss function to train the CNN model for multivariate prediction. We have developed a CNN model framework that addresses these challenges by proposing a conservative interpolation method and a normalized mean squared error loss function. We simulated a model dataset to train the CNN for turbulent flow prediction in periodic porous media composed of cylindrical solid obstacles with square cross-section by varying the porosity in the range 0.3 to 0.88. We demonstrate that the resulting CNN model predicts the pressure and viscous drag forces with less than 10% mean absolute error when compared to LES while offering a speedup of O(10^6).
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Concatenating quantum error-correcting codes with decoherence-free subspaces and vice versa
Authors:
Nihar Ranjan Dash,
Sanjoy Dutta,
R. Srikanth,
Subhashish Banerjee
Abstract:
Quantum error-correcting codes (QECCs) and decoherence-free subspace (DFS) codes provide active and passive means, respectively, to address certain types of errors that arise during quantum computation. The latter technique is suitable to correct correlated errors with certain symmetries and the former to correct independent errors. The concatenation of a QECC and a DFS code results in a degenerat…
▽ More
Quantum error-correcting codes (QECCs) and decoherence-free subspace (DFS) codes provide active and passive means, respectively, to address certain types of errors that arise during quantum computation. The latter technique is suitable to correct correlated errors with certain symmetries and the former to correct independent errors. The concatenation of a QECC and a DFS code results in a degenerate code that splits into actively and passively correcting parts, with the degeneracy impacting either part, leading to degenerate errors as well as degenerate stabilizer operators. The concatenation of the two types of code can aid universal fault-tolerant quantum computation when a mix of correlated and independent errors is encountered. In particular, we show that for sufficiently strongly correlated errors, the concatenation with the DFS as the inner code provides better entanglement fidelity, whereas for sufficiently independent errors, the concatenation with the QECC as the inner code is preferable. As illustrative examples, we examine in detail the concatenation of a two-qubit DFS code and a three-qubit repetition code or five-qubit Knill-Laflamme code, under independent and correlated errors.
△ Less
Submitted 1 July, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Assembling PNIPAM-Capped Gold Nanoparticles in Aqueous Solutions
Authors:
Binay P. Nayak,
Hyeong Jin Kim,
Srikanth Nayak,
Wenjie Wang,
Wei Bu,
Surya K. Mallapragada,
David Vaknin
Abstract:
Employing small angle X-ray scattering (SAXS), we explore the conditions under which the assembly of gold nanoparticles (AuNPs) grafted with the thermo-sensitive polymer Poly(N-isopropylacrylamide) (PNIPAM) emerges. We find that short-range order assembly emerges by combining the addition of electrolytes or poly-electrolytes with raising the temperature of the suspensions above the lower-critical…
▽ More
Employing small angle X-ray scattering (SAXS), we explore the conditions under which the assembly of gold nanoparticles (AuNPs) grafted with the thermo-sensitive polymer Poly(N-isopropylacrylamide) (PNIPAM) emerges. We find that short-range order assembly emerges by combining the addition of electrolytes or poly-electrolytes with raising the temperature of the suspensions above the lower-critical solution temperature (LCST) of PNIPAM. Our results show that the longer the PNIPAM chain is, the better organization in the assembled clusters. Interestingly, without added electrolytes, there is no evidence of AuNP assembly as a function of temperature, although untethered PNIPAM is known to undergo a coil-to-globule transition above its LCST. This study demonstrates another approach to assembling potential thermo-sensitive nanostructures for devices by leveraging the unique properties of PNIPAM.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets
Authors:
Maya Srikanth,
Jeremy Irvin,
Brian Wesley Hill,
Felipe Godoy,
Ishan Sabane,
Andrew Y. Ng
Abstract:
Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved d…
▽ More
Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved data-centric methods for cleaning real world vision datasets, we first conduct more than 200 experiments carefully benchmarking recently developed automated mislabel detection methods on multiple datasets under a variety of synthetic and real noise settings with varying noise levels. We compare these methods to a Simple and Efficient Mislabel Detector (SEMD) that we craft, and find that SEMD performs similarly to or outperforms prior mislabel detection approaches. We then apply SEMD to multiple real world computer vision datasets and test how dataset size, mislabel removal strategy, and mislabel removal amount further affect model performance after retraining on the cleaned data. With careful design of the approach, we find that mislabel removal leads per-class performance improvements of up to 8% of a retrained classifier in smaller data regimes.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Yielding behaviour of active particles in bulk and in confinement
Authors:
Yagyik Goswami,
G. V. Shivashankar,
Srikanth Sastry
Abstract:
The investigation of collective behaviour in dense assemblies of self-propelled active particles has been motivated by a wide range of biological phenomena. Of particular interest are dynamical transitions of cellular and sub-cellular biological assemblies, including the cytoskeleton and the cell nucleus. Motivated by observations of mechanically induced changes in the dynamics of such systems, an…
▽ More
The investigation of collective behaviour in dense assemblies of self-propelled active particles has been motivated by a wide range of biological phenomena. Of particular interest are dynamical transitions of cellular and sub-cellular biological assemblies, including the cytoskeleton and the cell nucleus. Motivated by observations of mechanically induced changes in the dynamics of such systems, and the apparent role of confinement geometry, we study the transition between jammed and fluidized states of active particles assemblies, as a function of the strength and temporal persistence of active forces, and in different confinement geometries. Our results show that the fluidization transition broadly resembles yielding in amorphous solids, consistently with recent suggestions. More specifically, however, we find that a detailed analogy holds with the yielding transition under cyclic shear deformation, for finite persistence times. The fluidization transition is accompanied by driving induced annealing, strong dependence on the initial state of the system, a divergence of time scales to reach steady states, and a discontinuous onset of diffusive motion. We also observe a striking dependence of the transition on persistence times and on the nature of the confinement. Collectively, our results have implications in epigenetic cell state transitions induced by alterations in confinement geometry.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
PEFTDebias : Capturing debiasing information using PEFTs
Authors:
Sumit Agarwal,
Aditya Srikanth Veerubhotla,
Srijan Bansal
Abstract:
The increasing use of foundation models highlights the urgent need to address and eliminate implicit biases present in them that arise during pretraining. In this paper, we introduce PEFTDebias, a novel approach that employs parameter-efficient fine-tuning (PEFT) to mitigate the biases within foundation models. PEFTDebias consists of two main phases: an upstream phase for acquiring debiasing param…
▽ More
The increasing use of foundation models highlights the urgent need to address and eliminate implicit biases present in them that arise during pretraining. In this paper, we introduce PEFTDebias, a novel approach that employs parameter-efficient fine-tuning (PEFT) to mitigate the biases within foundation models. PEFTDebias consists of two main phases: an upstream phase for acquiring debiasing parameters along a specific bias axis, and a downstream phase where these parameters are incorporated into the model and frozen during the fine-tuning process. By evaluating on four datasets across two bias axes namely gender and race, we find that downstream biases can be effectively reduced with PEFTs. In addition, we show that these parameters possess axis-specific debiasing characteristics, enabling their effective transferability in mitigating biases in various downstream tasks. To ensure reproducibility, we release the code to do our experiments.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting Self-Stimulatory Behaviours in Children using raw videos
Authors:
Vaibhavi Lokegaonkar,
Vijay Jaisankar,
Pon Deepika,
Madhav Rao,
T K Srikanth,
Sarbani Mallick,
Manjit Sodhi
Abstract:
Conventionally, evaluation for the diagnosis of Autism spectrum disorder is done by a trained specialist through questionnaire-based formal assessments and by observation of behavioral cues under various settings to capture the early warning signs of autism. These evaluation techniques are highly subjective and their accuracy relies on the experience of the specialist. In this regard, machine lear…
▽ More
Conventionally, evaluation for the diagnosis of Autism spectrum disorder is done by a trained specialist through questionnaire-based formal assessments and by observation of behavioral cues under various settings to capture the early warning signs of autism. These evaluation techniques are highly subjective and their accuracy relies on the experience of the specialist. In this regard, machine learning-based methods for automated capturing of early signs of autism from the recorded videos of the children is a promising alternative. In this paper, the authors propose a novel pipelined deep learning architecture to detect certain self-stimulatory behaviors that help in the diagnosis of autism spectrum disorder (ASD). The authors also supplement their tool with an augmented version of the Self Stimulatory Behavior Dataset (SSBD) and also propose a new label in SSBD Action detection: no-class. The deep learning model with the new dataset is made freely available for easy adoption to the researchers and developers community. An overall accuracy of around 81% was achieved from the proposed pipeline model that is targeted for real-time and hands-free automated diagnosis. All of the source code, data, licenses of use, and other relevant material is made freely available in https://github.com/sarl-iiitb/
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction
Authors:
Jia Huang,
Peng Jiang,
Alvika Gautam,
Srikanth Saripalli
Abstract:
Predicting pedestrian behavior is the key to ensure safety and reliability of autonomous vehicles. While deep learning methods have been promising by learning from annotated video frame sequences, they often fail to fully grasp the dynamic interactions between pedestrians and traffic, crucial for accurate predictions. These models also lack nuanced common sense reasoning. Moreover, the manual anno…
▽ More
Predicting pedestrian behavior is the key to ensure safety and reliability of autonomous vehicles. While deep learning methods have been promising by learning from annotated video frame sequences, they often fail to fully grasp the dynamic interactions between pedestrians and traffic, crucial for accurate predictions. These models also lack nuanced common sense reasoning. Moreover, the manual annotation of datasets for these models is expensive and challenging to adapt to new situations. The advent of Vision Language Models (VLMs) introduces promising alternatives to these issues, thanks to their advanced visual and causal reasoning skills. To our knowledge, this research is the first to conduct both quantitative and qualitative evaluations of VLMs in the context of pedestrian behavior prediction for autonomous driving. We evaluate GPT-4V(ision) on publicly available pedestrian datasets: JAAD and WiDEVIEW. Our quantitative analysis focuses on GPT-4V's ability to predict pedestrian behavior in current and future frames. The model achieves a 57% accuracy in a zero-shot manner, which, while impressive, is still behind the state-of-the-art domain-specific models (70%) in predicting pedestrian crossing actions. Qualitatively, GPT-4V shows an impressive ability to process and interpret complex traffic scenarios, differentiate between various pedestrian behaviors, and detect and analyze groups. However, it faces challenges, such as difficulty in detecting smaller pedestrians and assessing the relative motion between pedestrians and the ego vehicle.
△ Less
Submitted 25 January, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.