-
Digitization of Document and Information Extraction using OCR
Authors:
Rasha Sinha,
Rekha B S
Abstract:
Retrieving accurate details from documents is a crucial task, especially when handling a combination of scanned images and native digital formats. This document presents a combined framework for text extraction that merges Optical Character Recognition (OCR) techniques with Large Language Models (LLMs) to deliver structured outputs enriched by contextual understanding and confidence indicators. Sc…
▽ More
Retrieving accurate details from documents is a crucial task, especially when handling a combination of scanned images and native digital formats. This document presents a combined framework for text extraction that merges Optical Character Recognition (OCR) techniques with Large Language Models (LLMs) to deliver structured outputs enriched by contextual understanding and confidence indicators. Scanned files are processed using OCR engines, while digital files are interpreted through layout-aware libraries. The extracted raw text is subsequently analyzed by an LLM to identify key-value pairs and resolve ambiguities. A comparative analysis of different OCR tools is presented to evaluate their effectiveness concerning accuracy, layout recognition, and processing speed. The approach demonstrates significant improvements over traditional rule-based and template-based methods, offering enhanced flexibility and semantic precision across different document categories
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Microservices and Real-Time Processing in Retail IT: A Review of Open-Source Toolchains and Deployment Strategies
Authors:
Aaditaa Vashisht,
Rekha B S
Abstract:
With the rapid pace of digital transformation, the retail industry is increasingly depending on real-time, scalable, and resilient systems to manage financial transactions, analyze customer behavior, and streamline order processing. This literature review explores how modern event-driven and microservices-based architectures, particularly those leveraging Apache Kafka, Spring Boot, MongoDB, and Ku…
▽ More
With the rapid pace of digital transformation, the retail industry is increasingly depending on real-time, scalable, and resilient systems to manage financial transactions, analyze customer behavior, and streamline order processing. This literature review explores how modern event-driven and microservices-based architectures, particularly those leveraging Apache Kafka, Spring Boot, MongoDB, and Kubernetes are transforming retail and financial systems. By systematically reviewing academic publications, technical white papers, and industry reports from recent years, this study synthesizes key themes and implementation strategies. The analysis reveals that technologies like Kafka and Spring Boot are instrumental in building low-latency, event-driven applications that support real-time analytics and fraud detection, while MongoDB, when deployed on Kubernetes, ensures fault tolerance and high availability in inventory and transaction systems. Kubernetes itself plays a crucial role in automating deployment and scaling of microservices. These findings provide valuable insights for industry practitioners aiming to design scalable infrastructures, identify research opportunities in hybrid deployment models, and offer educators a foundation to integrate modern system architectures into professional and technical communication training.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
A Cytology Dataset for Early Detection of Oral Squamous Cell Carcinoma
Authors:
Garima Jain,
Sanghamitra Pati,
Mona Duggal,
Amit Sethi,
Abhijeet Patil,
Gururaj Malekar,
Nilesh Kowe,
Jitender Kumar,
Jatin Kashyap,
Divyajeet Rout,
Deepali,
Hitesh,
Nishi Halduniya,
Sharat Kumar,
Heena Tabassum,
Rupinder Singh Dhaliwal,
Sucheta Devi Khuraijam,
Sushma Khuraijam,
Sharmila Laishram,
Simmi Kharb,
Sunita Singh,
K. Swaminadtan,
Ranjana Solanki,
Deepika Hemranjani,
Shashank Nath Singh
, et al. (12 additional authors not shown)
Abstract:
Oral squamous cell carcinoma OSCC is a major global health burden, particularly in several regions across Asia, Africa, and South America, where it accounts for a significant proportion of cancer cases. Early detection dramatically improves outcomes, with stage I cancers achieving up to 90 percent survival. However, traditional diagnosis based on histopathology has limited accessibility in low-res…
▽ More
Oral squamous cell carcinoma OSCC is a major global health burden, particularly in several regions across Asia, Africa, and South America, where it accounts for a significant proportion of cancer cases. Early detection dramatically improves outcomes, with stage I cancers achieving up to 90 percent survival. However, traditional diagnosis based on histopathology has limited accessibility in low-resource settings because it is invasive, resource-intensive, and reliant on expert pathologists. On the other hand, oral cytology of brush biopsy offers a minimally invasive and lower cost alternative, provided that the remaining challenges, inter observer variability and unavailability of expert pathologists can be addressed using artificial intelligence. Development and validation of robust AI solutions requires access to large, labeled, and multi-source datasets to train high capacity models that generalize across domain shifts. We introduce the first large and multicenter oral cytology dataset, comprising annotated slides stained with Papanicolaou(PAP) and May-Grunwald-Giemsa(MGG) protocols, collected from ten tertiary medical centers in India. The dataset is labeled and annotated by expert pathologists for cellular anomaly classification and detection, is designed to advance AI driven diagnostic methods. By filling the gap in publicly available oral cytology datasets, this resource aims to enhance automated detection, reduce diagnostic errors, and improve early OSCC diagnosis in resource-constrained settings, ultimately contributing to reduced mortality and better patient outcomes worldwide.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
DROID: Discrete-Time Simulation for Ring-Oscillator-Based Ising Design
Authors:
Abhimanyu Kumar,
Ramprasath S.,
Chris H. Kim,
Ulya R. Karpuzcu,
Sachin S. Sapatnekar
Abstract:
Many combinatorial problems can be mapped to Ising machines, i.e., networks of coupled oscillators that settle to a minimum-energy ground state, from which the problem solution is inferred. This work proposes DROID, a novel event-driven method for simulating the evolution of a CMOS Ising machine to its ground state. The approach is accurate under general delay-phase relations that include the effe…
▽ More
Many combinatorial problems can be mapped to Ising machines, i.e., networks of coupled oscillators that settle to a minimum-energy ground state, from which the problem solution is inferred. This work proposes DROID, a novel event-driven method for simulating the evolution of a CMOS Ising machine to its ground state. The approach is accurate under general delay-phase relations that include the effects of the transistor nonlinearities and is computationally efficient. On a realistic-size all-to-all coupled ring oscillator array, DROID is nearly four orders of magnitude faster than a traditional HSPICE simulation in predicting the evolution of a coupled oscillator system and is demonstrated to attain a similar distribution of solutions as the hardware.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Soft is Safe: Human-Robot Interaction for Soft Robots
Authors:
Rajashekhar V S,
Gowdham Prabhakar
Abstract:
With the presence of robots increasing in the society, the need for interacting with robots is becoming necessary. The field of Human-Robot Interaction (HRI) has emerged important since more repetitive and tiresome jobs are being done by robots. In the recent times, the field of soft robotics has seen a boom in the field of research and commercialization. The Industry 5.0 focuses on human robot co…
▽ More
With the presence of robots increasing in the society, the need for interacting with robots is becoming necessary. The field of Human-Robot Interaction (HRI) has emerged important since more repetitive and tiresome jobs are being done by robots. In the recent times, the field of soft robotics has seen a boom in the field of research and commercialization. The Industry 5.0 focuses on human robot collaboration which also spurs the field of soft robotics. However the HRI for soft robotics is still in the nascent stage. In this work we review and then discuss how HRI is done for soft robots. We first discuss the control, design, materials and manufacturing of soft robots. This will provide an understanding of what is being interacted with. Then we discuss about the various input and output modalities that are used in HRI. The applications where the HRI for soft robots are found in the literature are discussed in detail. Then the limitations of HRI for soft robots and various research opportunities that exist in this field are discussed in detail. It is concluded that there is a huge scope for development for HRI for soft robots.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
AI Guided Early Screening of Cervical Cancer
Authors:
Dharanidharan S I,
Suhitha Renuka S V,
Ajishi Singh,
Sheena Christabel Pravin
Abstract:
In order to support the creation of reliable machine learning models for anomaly detection, this project focuses on preprocessing, enhancing, and organizing a medical imaging dataset. There are two classifications in the dataset: normal and abnormal, along with extra noise fluctuations. In order to improve the photographs' quality, undesirable artifacts, including visible medical equipment at the…
▽ More
In order to support the creation of reliable machine learning models for anomaly detection, this project focuses on preprocessing, enhancing, and organizing a medical imaging dataset. There are two classifications in the dataset: normal and abnormal, along with extra noise fluctuations. In order to improve the photographs' quality, undesirable artifacts, including visible medical equipment at the edges, were eliminated using central cropping. Adjusting the brightness and contrast was one of the additional preprocessing processes. Normalization was then performed to normalize the data. To make classification jobs easier, the dataset was methodically handled by combining several image subsets into two primary categories: normal and pathological. To provide a strong training set that adapts well to real-world situations, sophisticated picture preprocessing techniques were used, such as contrast enhancement and real-time augmentation (including rotations, zooms, and brightness modifications). To guarantee efficient model evaluation, the data was subsequently divided into training and testing subsets. In order to create precise and effective machine learning models for medical anomaly detection, high-quality input data is ensured via this thorough approach. Because of the project pipeline's flexible and scalable design, it can be easily integrated with bigger clinical decision-support systems.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Random Heterogeneous Neurochaos Learning Architecture for Data Classification
Authors:
Remya Ajai A S,
Nithin Nagaraj
Abstract:
Inspired by the human brain's structure and function, Artificial Neural Networks (ANN) were developed for data classification. However, existing Neural Networks, including Deep Neural Networks, do not mimic the brain's rich structure. They lack key features such as randomness and neuron heterogeneity, which are inherently chaotic in their firing behavior. Neurochaos Learning (NL), a chaos-based ne…
▽ More
Inspired by the human brain's structure and function, Artificial Neural Networks (ANN) were developed for data classification. However, existing Neural Networks, including Deep Neural Networks, do not mimic the brain's rich structure. They lack key features such as randomness and neuron heterogeneity, which are inherently chaotic in their firing behavior. Neurochaos Learning (NL), a chaos-based neural network, recently employed one-dimensional chaotic maps like Generalized Lüroth Series (GLS) and Logistic map as neurons. For the first time, we propose a random heterogeneous extension of NL, where various chaotic neurons are randomly placed in the input layer, mimicking the randomness and heterogeneous nature of human brain networks. We evaluated the performance of the newly proposed Random Heterogeneous Neurochaos Learning (RHNL) architectures combined with traditional Machine Learning (ML) methods. On public datasets, RHNL outperformed both homogeneous NL and fixed heterogeneous NL architectures in nearly all classification tasks. RHNL achieved high F1 scores on the Wine dataset (1.0), Bank Note Authentication dataset (0.99), Breast Cancer Wisconsin dataset (0.99), and Free Spoken Digit Dataset (FSDD) (0.98). These RHNL results are among the best in the literature for these datasets. We investigated RHNL performance on image datasets, where it outperformed stand-alone ML classifiers. In low training sample regimes, RHNL was the best among stand-alone ML. Our architecture bridges the gap between existing ANN architectures and the human brain's chaotic, random, and heterogeneous properties. We foresee the development of several novel learning algorithms centered around Random Heterogeneous Neurochaos Learning in the coming days.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
DiffGAN: A Test Generation Approach for Differential Testing of Deep Neural Networks
Authors:
Zohreh Aghababaeyan,
Manel Abdellatif,
Lionel Briand,
Ramesh S
Abstract:
Deep Neural Networks (DNNs) are increasingly deployed across applications. However, ensuring their reliability remains a challenge, and in many situations, alternative models with similar functionality and accuracy are available. Traditional accuracy-based evaluations often fail to capture behavioral differences between models, especially with limited test datasets, making it difficult to select o…
▽ More
Deep Neural Networks (DNNs) are increasingly deployed across applications. However, ensuring their reliability remains a challenge, and in many situations, alternative models with similar functionality and accuracy are available. Traditional accuracy-based evaluations often fail to capture behavioral differences between models, especially with limited test datasets, making it difficult to select or combine models effectively. Differential testing addresses this by generating test inputs that expose discrepancies in DNN model behavior. However, existing approaches face significant limitations: many rely on model internals or are constrained by available seed inputs. To address these challenges, we propose DiffGAN, a black-box test image generation approach for differential testing of DNN models. DiffGAN leverages a Generative Adversarial Network (GAN) and the Non-dominated Sorting Genetic Algorithm II to generate diverse and valid triggering inputs that reveal behavioral discrepancies between models. DiffGAN employs two custom fitness functions, focusing on diversity and divergence, to guide the exploration of the GAN input space and identify discrepancies between models' outputs. By strategically searching this space, DiffGAN generates inputs with specific features that trigger differences in model behavior. DiffGAN is black-box, making it applicable in more situations. We evaluate DiffGAN on eight DNN model pairs trained on widely used image datasets. Our results show DiffGAN significantly outperforms a SOTA baseline, generating four times more triggering inputs, with greater diversity and validity, within the same budget. Additionally, the generated inputs improve the accuracy of a machine learning-based model selection mechanism, which selects the best-performing model based on input characteristics and can serve as a smart output voting mechanism when using alternative models.
△ Less
Submitted 11 May, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.
-
An Exploration of Agile Methods in the Automotive Industry: Benefits, Challenges and Opportunities
Authors:
Mehrnoosh Askarpour,
Sahar Kokaly,
Ramesh S
Abstract:
Agile methodologies have gained significant traction in the software development industry, promising increased flexibility and responsiveness to changing requirements. However, their applicability to safety-critical systems, particularly in the automotive sector, remains a topic of debate. This paper examines the benefits and challenges of implementing agile methods in the automotive industry thro…
▽ More
Agile methodologies have gained significant traction in the software development industry, promising increased flexibility and responsiveness to changing requirements. However, their applicability to safety-critical systems, particularly in the automotive sector, remains a topic of debate. This paper examines the benefits and challenges of implementing agile methods in the automotive industry through a comprehensive review of relevant literature and case studies. Our findings highlight the potential advantages of agile approaches, such as improved collaboration and faster time-to-market, as well as the inherent challenges, including safety compliance and cultural resistance. By synthesizing existing research and practical insights, this paper aims to provide an understanding of the role of agile methods in shaping the future of automotive software development.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
NSSR-DIL: Null-Shot Image Super-Resolution Using Deep Identity Learning
Authors:
Sree Rama Vamsidhar S,
Rama Krishna Gorthi
Abstract:
The present State-of-the-Art (SotA) Image Super-Resolution (ISR) methods employ Deep Learning (DL) techniques using a large amount of image data. The primary limitation to extending the existing SotA ISR works for real-world instances is their computational and time complexities. In this paper, contrary to the existing methods, we present a novel and computationally efficient ISR algorithm that is…
▽ More
The present State-of-the-Art (SotA) Image Super-Resolution (ISR) methods employ Deep Learning (DL) techniques using a large amount of image data. The primary limitation to extending the existing SotA ISR works for real-world instances is their computational and time complexities. In this paper, contrary to the existing methods, we present a novel and computationally efficient ISR algorithm that is independent of the image dataset to learn the ISR task. The proposed algorithm reformulates the ISR task from generating the Super-Resolved (SR) images to computing the inverse of the kernels that span the degradation space. We introduce Deep Identity Learning, exploiting the identity relation between the degradation and inverse degradation models. The proposed approach neither relies on the ISR dataset nor on a single input low-resolution (LR) image (like the self-supervised method i.e. ZSSR) to model the ISR task. Hence we term our model as Null-Shot Super-Resolution Using Deep Identity Learning (NSSR-DIL). The proposed NSSR-DIL model requires fewer computational resources, at least by an order of 10, and demonstrates a competitive performance on benchmark ISR datasets. Another salient aspect of our proposition is that the NSSR-DIL framework detours retraining the model and remains the same for varying scale factors like X2, X3, and X4. This makes our highly efficient ISR model more suitable for real-world applications.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Soft Acoustic Curvature Sensor: Design and Development
Authors:
Mohammad Sheikh Sofla,
Hanita Golshanian,
Vishnu Rajendran S,
Amir Ghalamzan E
Abstract:
This paper introduces a novel Soft Acoustic Curvature (SAC) sensor. SAC incorporates integrated audio components and features an acoustic channel within a flexible structure. A reference acoustic wave, generated by a speaker at one end of the channel, propagates and is received by a microphone at the other channel's end. Our previous study revealed that acoustic wave energy dissipation varies with…
▽ More
This paper introduces a novel Soft Acoustic Curvature (SAC) sensor. SAC incorporates integrated audio components and features an acoustic channel within a flexible structure. A reference acoustic wave, generated by a speaker at one end of the channel, propagates and is received by a microphone at the other channel's end. Our previous study revealed that acoustic wave energy dissipation varies with acoustic channel deformation, leading us to design a novel channel capable of large deformation due to bending. We then use Machine Learning (ML) models to establish a complex mapping between channel deformations and sound modulation. Various sound frequencies and ML models were evaluated to enhance curvature detection accuracy. The sensor, constructed using soft material and 3D printing, was validated experimentally, with curvature measurement errors remaining within 3.5 m-1 for a range of 0 to 60 m-1 curvatures. These results demonstrate the effectiveness of the proposed method for estimating curvatures. With its flexible structure, the SAC sensor holds potential for applications in soft robotics, including shape measurement for continuum manipulators, soft grippers, and wearable devices.
△ Less
Submitted 27 September, 2024; v1 submitted 10 September, 2024;
originally announced September 2024.
-
Heads Up eXperience (HUX): Always-On AI Companion for Human Computer Environment Interaction
Authors:
Sukanth K,
Sudhiksha Kandavel Rajan,
Rajashekhar V S,
Gowdham Prabhakar
Abstract:
While current personal smart devices excel in digital domains, they fall short in assisting users during human environment interaction. This paper proposes Heads Up eXperience (HUX), an AI system designed to bridge this gap, serving as a constant companion across the extended reality (XR) environments. By tracking the user's eye gaze, analyzing the surrounding environment, and interpreting verbal…
▽ More
While current personal smart devices excel in digital domains, they fall short in assisting users during human environment interaction. This paper proposes Heads Up eXperience (HUX), an AI system designed to bridge this gap, serving as a constant companion across the extended reality (XR) environments. By tracking the user's eye gaze, analyzing the surrounding environment, and interpreting verbal contexts, the system captures and enhances multi-modal data, providing holistic context interpretation and memory storage in real-time task specific situations. This comprehensive approach enables more natural, empathetic and intelligent interactions between the user and HUX AI, paving the path for human computer environment interaction. Intended for deployment in smart glasses and extended reality headsets, HUX AI aims to become a personal and useful AI companion for daily life. By integrating digital assistance with enhanced physical world interactions, this technology has the potential to revolutionize human-AI collaboration in both personal and professional spheres paving the way for the future of personal smart devices.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Effective-LDAM: An Effective Loss Function To Mitigate Data Imbalance for Robust Chest X-Ray Disease Classification
Authors:
Sree Rama Vamsidhar S,
Bhargava Satya,
Rama Krishna Gorthi
Abstract:
Deep Learning (DL) approaches have gained prominence in medical imaging for disease diagnosis. Chest X-ray (CXR) classification has emerged as an effective method for detecting various diseases. Among these methodologies, Chest X-ray (CXR) classification has proven to be an effective approach for detecting and analyzing various diseases. However, the reliable performance of DL classification algor…
▽ More
Deep Learning (DL) approaches have gained prominence in medical imaging for disease diagnosis. Chest X-ray (CXR) classification has emerged as an effective method for detecting various diseases. Among these methodologies, Chest X-ray (CXR) classification has proven to be an effective approach for detecting and analyzing various diseases. However, the reliable performance of DL classification algorithms is dependent upon access to large and balanced datasets, which pose challenges in medical imaging due to the impracticality of acquiring sufficient data for every disease category. To tackle this problem, we propose an algorithmic-centric approach called Effective-Label Distribution Aware Margin (E-LDAM), which modifies the margin of the widely adopted Label Distribution Aware Margin (LDAM) loss function using an effective number of samples in each class. Experimental evaluations on the COVIDx CXR dataset focus on Normal, Pneumonia, and COVID-19 classification. The experimental results demonstrate the effectiveness of the proposed E-LDAM approach, achieving a remarkable recall score of 97.81% for the minority class (COVID-19) in CXR image prediction. Furthermore, the overall accuracy of the three-class classification task attains an impressive level of 95.26%.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Serpentine Synergy: Design and Fabrication of a Dual Soft Continuum Manipulator and Soft Snake Robot
Authors:
Rajashekhar V S,
Aravinth Rajesh,
Muhammad Imam Anugrahadi Athaaillah,
Gowdham Prabhakar
Abstract:
This work presents a soft continuum robot (SCR) that can be used as a soft continuum manipulator (SCM) and a soft snake robot (SSR). This is achieved using expanded polyethylene foam (EPE) modules as the soft material. In situations like post-earthquake search operations, these dual-purpose robots could play a vital role. The soft continuum manipulator with a camera attached to the tip can manuall…
▽ More
This work presents a soft continuum robot (SCR) that can be used as a soft continuum manipulator (SCM) and a soft snake robot (SSR). This is achieved using expanded polyethylene foam (EPE) modules as the soft material. In situations like post-earthquake search operations, these dual-purpose robots could play a vital role. The soft continuum manipulator with a camera attached to the tip can manually search for survivors in the debris. On the other hand, the soft snake robot can be made by attaching an active wheel to the soft continuum manipulator. This mobile robot can reach places humans cannot and gather information about survivors. This work presents the design, fabrication, and experimental validation of the dual soft continuum robot.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Narrow Transformer: StarCoder-Based Java-LM For Desktop
Authors:
Kamalkumar Rathinasamy,
Balaji A J,
Ankush Kumar,
Gagan Gayari,
Harshini K,
Rajab Ali Mondal,
Sreenivasa Raghavan K S,
Swayam Singh,
Mohammed Rafee Tarafdar
Abstract:
This paper presents NT-Java-1.1B, an open-source specialized code language model built on StarCoderBase-1.1B, designed for coding tasks in Java programming. NT-Java-1.1B achieves state-of-the-art performance, surpassing its base model and majority of other models of similar size on MultiPL-E Java code benchmark. While there have been studies on extending large, generic pre-trained models to improv…
▽ More
This paper presents NT-Java-1.1B, an open-source specialized code language model built on StarCoderBase-1.1B, designed for coding tasks in Java programming. NT-Java-1.1B achieves state-of-the-art performance, surpassing its base model and majority of other models of similar size on MultiPL-E Java code benchmark. While there have been studies on extending large, generic pre-trained models to improve proficiency in specific programming languages like Python, similar investigations on small code models for other programming languages are lacking. Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. This paper addresses this research gap by focusing on the development of a small Java code model, NT-Java-1.1B, and its quantized versions, which performs comparably to open models around 1.1B on MultiPL-E Java code benchmarks, making them ideal for desktop deployment. This paper establishes the foundation for specialized models across languages and sizes for a family of NT Models.
△ Less
Submitted 7 September, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Unifying Mixed Gas Adsorption in Molecular Sieve Membranes and MOFs using Machine Learning
Authors:
Subhadeep Dasgupta,
Amal R S,
Prabal K. Maiti
Abstract:
Recent machine learning models to accurately obtain gas adsorption isotherms focus on polymers or metal-organic frameworks (MOFs) separately. The difficulty in creating a unified model that can predict the adsorption trends in both types of adsorbents is challenging, owing to the diversity in their chemical structures. Moreover, models trained only on single gas adsorption data are incapable of pr…
▽ More
Recent machine learning models to accurately obtain gas adsorption isotherms focus on polymers or metal-organic frameworks (MOFs) separately. The difficulty in creating a unified model that can predict the adsorption trends in both types of adsorbents is challenging, owing to the diversity in their chemical structures. Moreover, models trained only on single gas adsorption data are incapable of predicting adsorption isotherms for binary gas mixtures. In this work, we address these problems using feature vectors comprising only the physical properties of the gas mixtures and adsorbents. Our model is trained on adsorption isotherms of both single and binary mixed gases inside carbon molecular sieving membrane (CMSM), together with data available from CoRE MOF database. The trained models are capable of accurately predicting the adsorption trends in both classes of materials, for both pure and binary components. ML architecture designed for one class of material, is not suitable for predicting the other class, even after proper training, signifying that the model must be trained jointly for proper predictions and transferability. The model is used to predict with good accuracy the CO2 uptake inside CALF-20 framework. This work opens up a new avenue for predicting complex adsorption processes for gas mixtures in a wide range of materials.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
Authors:
Chandra Kiran Reddy Evuru,
Sreyan Ghosh,
Sonal Kumar,
Ramaneswaran S,
Utkarsh Tyagi,
Dinesh Manocha
Abstract:
We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf instruction-following Large Language Models (LLMs) for generating text that satisfies a set of constraints. Precisely, we extract a set of simple constraints from every instanc…
▽ More
We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf instruction-following Large Language Models (LLMs) for generating text that satisfies a set of constraints. Precisely, we extract a set of simple constraints from every instance in the low-resource dataset and verbalize them to prompt an LLM to generate novel and diverse training instances. Our findings reveal that synthetic data that follows simple constraints in the downstream dataset act as highly effective augmentations, and CoDa can achieve this without intricate decoding-time constrained generation techniques or fine-tuning with complex algorithms that eventually make the model biased toward the small number of training instances. Additionally, CoDa is the first framework that provides users explicit control over the augmentation generation process, thereby also allowing easy adaptation to several domains. We demonstrate the effectiveness of CoDa across 11 datasets spanning 3 tasks and 3 low-resource settings. CoDa outperforms all our baselines, qualitatively and quantitatively, with improvements of 0.12%-7.19%. Code is available here: https://github.com/Sreyan88/CoDa
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Emotion-Aware Multimodal Fusion for Meme Emotion Detection
Authors:
Shivam Sharma,
Ramaneswaran S,
Md. Shad Akhtar,
Tanmoy Chakraborty
Abstract:
The ever-evolving social media discourse has witnessed an overwhelming use of memes to express opinions or dissent. Besides being misused for spreading malcontent, they are mined by corporations and political parties to glean the public's opinion. Therefore, memes predominantly offer affect-enriched insights towards ascertaining the societal psyche. However, the current approaches are yet to model…
▽ More
The ever-evolving social media discourse has witnessed an overwhelming use of memes to express opinions or dissent. Besides being misused for spreading malcontent, they are mined by corporations and political parties to glean the public's opinion. Therefore, memes predominantly offer affect-enriched insights towards ascertaining the societal psyche. However, the current approaches are yet to model the affective dimensions expressed in memes effectively. They rely extensively on large multimodal datasets for pre-training and do not generalize well due to constrained visual-linguistic grounding. In this paper, we introduce MOOD (Meme emOtiOns Dataset), which embodies six basic emotions. We then present ALFRED (emotion-Aware muLtimodal Fusion foR Emotion Detection), a novel multimodal neural framework that (i) explicitly models emotion-enriched visual cues, and (ii) employs an efficient cross-modal fusion via a gating mechanism. Our investigation establishes ALFRED's superiority over existing baselines by 4.94% F1. Additionally, ALFRED competes strongly with previous best approaches on the challenging Memotion task. We then discuss ALFRED's domain-agnostic generalizability by demonstrating its dominance on two recently-released datasets - HarMeme and Dank Memes, over other baselines. Further, we analyze ALFRED's interpretability using attention maps. Finally, we highlight the inherent challenges posed by the complex interplay of disparate modality-specific cues toward meme analysis.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
A Closer Look at the Limitations of Instruction Tuning
Authors:
Sreyan Ghosh,
Chandra Kiran Reddy Evuru,
Sonal Kumar,
Ramaneswaran S,
Deepali Aneja,
Zeyu Jin,
Ramani Duraiswami,
Dinesh Manocha
Abstract:
Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in…
▽ More
Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through IT, we reveal various limitations of IT. In particular, we show that (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation. (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality. (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses. (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets. We hope the insights and challenges revealed in this paper inspire future work in related directions.
△ Less
Submitted 14 July, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Automated Detection and Counting of Windows using UAV Imagery based Remote Sensing
Authors:
Dhruv Patel,
Shivani Chepuri,
Sarvesh Thakur,
K. Harikumar,
Ravi Kiran S.,
K. Madhava Krishna
Abstract:
Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and co…
▽ More
Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and count the number of windows of a building by deploying an Unmanned Aerial Vehicle (UAV) based remote sensing system is proposed. The proposed two-stage method automates the identification and counting of windows by developing computer vision pipelines that utilize data from UAV's onboard camera and other sensors. Quantitative and Qualitative results show the effectiveness of our proposed approach in accurately detecting and counting the windows compared to the existing method.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
From Multilingual Complexity to Emotional Clarity: Leveraging Commonsense to Unveil Emotions in Code-Mixed Dialogues
Authors:
Shivani Kumar,
Ramaneswaran S,
Md Shad Akhtar,
Tanmoy Chakraborty
Abstract:
Understanding emotions during conversation is a fundamental aspect of human communication, driving NLP research for Emotion Recognition in Conversation (ERC). While considerable research has focused on discerning emotions of individual speakers in monolingual dialogues, understanding the emotional dynamics in code-mixed conversations has received relatively less attention. This motivates our under…
▽ More
Understanding emotions during conversation is a fundamental aspect of human communication, driving NLP research for Emotion Recognition in Conversation (ERC). While considerable research has focused on discerning emotions of individual speakers in monolingual dialogues, understanding the emotional dynamics in code-mixed conversations has received relatively less attention. This motivates our undertaking of ERC for code-mixed conversations in this study. Recognizing that emotional intelligence encompasses a comprehension of worldly knowledge, we propose an innovative approach that integrates commonsense information with dialogue context to facilitate a deeper understanding of emotions. To achieve this, we devise an efficient pipeline that extracts relevant commonsense from existing knowledge graphs based on the code-mixed input. Subsequently, we develop an advanced fusion technique that seamlessly combines the acquired commonsense information with the dialogue representation obtained from a dedicated dialogue understanding module. Our comprehensive experimentation showcases the substantial performance improvement obtained through the systematic incorporation of commonsense in ERC. Both quantitative assessments and qualitative analyses further corroborate the validity of our hypothesis, reaffirming the pivotal role of commonsense integration in enhancing ERC.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy
Authors:
Rohith Arumugam S,
Angel Deborah S
Abstract:
This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a Large Language Model Architecture, and these…
▽ More
This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a Large Language Model Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this effort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
BeSt-LeS: Benchmarking Stroke Lesion Segmentation using Deep Supervision
Authors:
Prantik Deb,
Lalith Bharadwaj Baru,
Kamalaker Dadi,
Bapi Raju S
Abstract:
Brain stroke has become a significant burden on global health and thus we need remedies and prevention strategies to overcome this challenge. For this, the immediate identification of stroke and risk stratification is the primary task for clinicians. To aid expert clinicians, automated segmentation models are crucial. In this work, we consider the publicly available dataset ATLAS $v2.0$ to benchma…
▽ More
Brain stroke has become a significant burden on global health and thus we need remedies and prevention strategies to overcome this challenge. For this, the immediate identification of stroke and risk stratification is the primary task for clinicians. To aid expert clinicians, automated segmentation models are crucial. In this work, we consider the publicly available dataset ATLAS $v2.0$ to benchmark various end-to-end supervised U-Net style models. Specifically, we have benchmarked models on both 2D and 3D brain images and evaluated them using standard metrics. We have achieved the highest Dice score of 0.583 on the 2D transformer-based model and 0.504 on the 3D residual U-Net respectively. We have conducted the Wilcoxon test for 3D models to correlate the relationship between predicted and actual stroke volume. For reproducibility, the code and model weights are made publicly available: https://github.com/prantik-pdeb/BeSt-LeS.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
3SAT on an All-to-All-Connected CMOS Ising Solver Chip
Authors:
Hüsrev Cılasun,
Ziqing Zeng,
Ramprasath S,
Abhimanyu Kumar,
Hao Lo,
William Cho,
Chris H. Kim,
Ulya R. Karpuzcu,
Sachin S. Sapatnekar
Abstract:
This work solves 3SAT, a classical NP-complete problem, on a CMOS-based Ising hardware chip with all-to-all connectivity. The paper addresses practical issues in going from algorithms to hardware. It considers several degrees of freedom in mapping the 3SAT problem to the chip - using multiple Ising formulations for 3SAT; exploring multiple strategies for decomposing large problems into subproblems…
▽ More
This work solves 3SAT, a classical NP-complete problem, on a CMOS-based Ising hardware chip with all-to-all connectivity. The paper addresses practical issues in going from algorithms to hardware. It considers several degrees of freedom in mapping the 3SAT problem to the chip - using multiple Ising formulations for 3SAT; exploring multiple strategies for decomposing large problems into subproblems that can be accommodated on the Ising chip; and executing a sequence of these subproblems on CMOS hardware to obtain the solution to the larger problem. These are evaluated within a software framework, and the results are used to identify the most promising formulations and decomposition techniques. These best approaches are then mapped to the all-to-all hardware, and the performance of 3SAT is evaluated on the chip. Experimental data shows that the deployed decomposition and mapping strategies impact SAT solution quality: without our methods, the CMOS hardware cannot achieve 3SAT solutions on SATLIB benchmarks.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Unleashing the Power of Dynamic Mode Decomposition and Deep Learning for Rainfall Prediction in North-East India
Authors:
Paleti Nikhil Chowdary,
Sathvika P,
Pranav U,
Rohan S,
Sowmya V,
Gopalakrishnan E A,
Dhanya M
Abstract:
Accurate rainfall forecasting is crucial for effective disaster preparedness and mitigation in the North-East region of India, which is prone to extreme weather events such as floods and landslides. In this study, we investigated the use of two data-driven methods, Dynamic Mode Decomposition (DMD) and Long Short-Term Memory (LSTM), for rainfall forecasting using daily rainfall data collected from…
▽ More
Accurate rainfall forecasting is crucial for effective disaster preparedness and mitigation in the North-East region of India, which is prone to extreme weather events such as floods and landslides. In this study, we investigated the use of two data-driven methods, Dynamic Mode Decomposition (DMD) and Long Short-Term Memory (LSTM), for rainfall forecasting using daily rainfall data collected from India Meteorological Department in northeast region over a period of 118 years. We conducted a comparative analysis of these methods to determine their relative effectiveness in predicting rainfall patterns. Using historical rainfall data from multiple weather stations, we trained and validated our models to forecast future rainfall patterns. Our results indicate that both DMD and LSTM are effective in forecasting rainfall, with LSTM outperforming DMD in terms of accuracy, revealing that LSTM has the ability to capture complex nonlinear relationships in the data, making it a powerful tool for rainfall forecasting. Our findings suggest that data-driven methods such as DMD and deep learning approaches like LSTM can significantly improve rainfall forecasting accuracy in the North-East region of India, helping to mitigate the impact of extreme weather events and enhance the region's resilience to climate change.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Ten Years of Generative Adversarial Nets (GANs): A survey of the state-of-the-art
Authors:
Tanujit Chakraborty,
Ujjwal Reddy K S,
Shraddha M. Naik,
Madhurima Panja,
Bayapureddy Manvitha
Abstract:
Since their inception in 2014, Generative Adversarial Networks (GANs) have rapidly emerged as powerful tools for generating realistic and diverse data across various domains, including computer vision and other applied areas. Consisting of a discriminative network and a generative network engaged in a Minimax game, GANs have revolutionized the field of generative modeling. In February 2018, GAN se…
▽ More
Since their inception in 2014, Generative Adversarial Networks (GANs) have rapidly emerged as powerful tools for generating realistic and diverse data across various domains, including computer vision and other applied areas. Consisting of a discriminative network and a generative network engaged in a Minimax game, GANs have revolutionized the field of generative modeling. In February 2018, GAN secured the leading spot on the ``Top Ten Global Breakthrough Technologies List'' issued by the Massachusetts Science and Technology Review. Over the years, numerous advancements have been proposed, leading to a rich array of GAN variants, such as conditional GAN, Wasserstein GAN, CycleGAN, and StyleGAN, among many others. This survey aims to provide a general overview of GANs, summarizing the latent architecture, validation metrics, and application areas of the most widely recognized variants. We also delve into recent theoretical developments, exploring the profound connection between the adversarial principle underlying GAN and Jensen-Shannon divergence, while discussing the optimality characteristics of the GAN framework. The efficiency of GAN variants and their model architectures will be evaluated along with training obstacles as well as training solutions. In addition, a detailed discussion will be provided, examining the integration of GANs with newly developed deep learning frameworks such as Transformers, Physics-Informed Neural Networks, Large Language models, and Diffusion models. Finally, we reveal several issues as well as future research outlines in this field.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
SimSched: A tool for Simulating Autosar Implementaion in Simulink
Authors:
Jian Chen,
Manar H. Alalfi,
Thomas R. Dean,
Ramesh S
Abstract:
AUTOSAR (AUTomotive Open System ARchitecture) is an open industry standard for the automotive sector. It defines the three-layered automotive software architecture. One of these layers is the application layer, where functional behaviors are encapsulated in Software Components (SW-Cs). Inside SW-Cs, a set of runnable entities represents the internal behavior and is realized as a set of tasks. To a…
▽ More
AUTOSAR (AUTomotive Open System ARchitecture) is an open industry standard for the automotive sector. It defines the three-layered automotive software architecture. One of these layers is the application layer, where functional behaviors are encapsulated in Software Components (SW-Cs). Inside SW-Cs, a set of runnable entities represents the internal behavior and is realized as a set of tasks. To address AUTOSAR's lack of support for modeling behaviors of runnables, languages such as Simulink are employed. Simulink simulations assume Simulink block behaviors are completed in zero execution time, while real execution requires a finite execution time. This timing mismatch can result in failures to detect unexpected runtime behaviors during the simulation phase. This paper extends the Simulink environment to model the timing properties of tasks. We present a Simulink block that can schedule tasks with non-zero simulation times. It enables a more realistic analysis during model development.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Optimal Kinematic Design of a Robotic Lizard using Four-Bar and Five-Bar Mechanisms
Authors:
Rajashekhar V S,
Debasish Ghose,
Arockia Selvakumar Arockia Doss
Abstract:
Designing a mechanism to mimic the motion of a common house gecko is the objective of this work. The body of the robot is designed using four five-bar mechanisms (2-RRRRR and 2-RRPRR) and the leg is designed using four four-bar mechanisms. The 2-RRRRR five-bar mechanisms form the head and tail of the robotic lizard. The 2-RRPRR five-bar mechanisms form the left and right sides of the body in the r…
▽ More
Designing a mechanism to mimic the motion of a common house gecko is the objective of this work. The body of the robot is designed using four five-bar mechanisms (2-RRRRR and 2-RRPRR) and the leg is designed using four four-bar mechanisms. The 2-RRRRR five-bar mechanisms form the head and tail of the robotic lizard. The 2-RRPRR five-bar mechanisms form the left and right sides of the body in the robotic lizard. The four five-bar mechanisms are actuated by only four rotary actuators. Of these, two actuators control the head movements and the other two control the tail movements. The RRPRR five-bar mechanism is controlled by one actuator from the head five-bar mechanism and the other by the tail five-bar mechanism. A tension spring connects each active link to a link in the four bar mechanism. When the robot is actuated, the head, tail and the body moves, and simultaneously each leg moves accordingly. This kind of actuation where the motion transfer occurs from body of the robot to the leg is the novelty in our design. The dimensional synthesis of the robotic lizard is done and presented. Then the forward and inverse kinematics of the mechanism, and configuration space singularities identification for the robot are presented. The gait exhibited by the gecko is studied and then simulated. A computer aided design of the robotic lizard is created and a prototype is made by 3D printing the parts. The prototype is controlled using Arduino UNO as a micro-controller. The experimental results are finally presented based on the gait analysis that was done earlier. The forward walking, and turning motion are done and snapshots are presented.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
HyperCoil-Recon: A Hypernetwork-based Adaptive Coil Configuration Task Switching Network for MRI Reconstruction
Authors:
Sriprabha Ramanarayanan,
Mohammad Al Fahim,
Rahul G. S.,
Amrit Kumar Jethi,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Parallel imaging, a fast MRI technique, involves dynamic adjustments based on the configuration i.e. number, positioning, and sensitivity of the coils with respect to the anatomy under study. Conventional deep learning-based image reconstruction models have to be trained or fine-tuned for each configuration, posing a barrier to clinical translation, given the lack of computational resources and ma…
▽ More
Parallel imaging, a fast MRI technique, involves dynamic adjustments based on the configuration i.e. number, positioning, and sensitivity of the coils with respect to the anatomy under study. Conventional deep learning-based image reconstruction models have to be trained or fine-tuned for each configuration, posing a barrier to clinical translation, given the lack of computational resources and machine learning expertise for clinicians to train models at deployment. Joint training on diverse datasets learns a single weight set that might underfit to deviated configurations. We propose, HyperCoil-Recon, a hypernetwork-based coil configuration task-switching network for multi-coil MRI reconstruction that encodes varying configurations of the numbers of coils in a multi-tasking perspective, posing each configuration as a task. The hypernetworks infer and embed task-specific weights into the reconstruction network, 1) effectively utilizing the contextual knowledge of common and varying image features among the various fields-of-view of the coils, and 2) enabling generality to unseen configurations at test time. Experiments reveal that our approach 1) adapts on the fly to various unseen configurations up to 32 coils when trained on lower numbers (i.e. 7 to 11) of randomly varying coils, and to 120 deviated unseen configurations when trained on 18 configurations in a single model, 2) matches the performance of coil configuration-specific models, and 3) outperforms configuration-invariant models with improvement margins of around 1 dB / 0.03 and 0.3 dB / 0.02 in PSNR / SSIM for knee and brain data. Our code is available at https://github.com/sriprabhar/HyperCoil-Recon
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
An Autonomous Hybrid Drone-Rover Vehicle for Weed Removal and Spraying Applications in Agriculture
Authors:
J Krishna Kant,
Mahankali Sripaad,
Anand Bharadwaj,
Rajashekhar V S,
Suresh Sundaram
Abstract:
The usage of drones and rovers helps to overcome the limitations of traditional agriculture which has been predominantly human-intensive, for carrying out tasks such as removal of weeds and spraying of fertilizers and pesticides. Drones and rovers are helping to realize precision agriculture and farmers with improved monitoring and surveying at affordable costs. Major benefits have come for vertic…
▽ More
The usage of drones and rovers helps to overcome the limitations of traditional agriculture which has been predominantly human-intensive, for carrying out tasks such as removal of weeds and spraying of fertilizers and pesticides. Drones and rovers are helping to realize precision agriculture and farmers with improved monitoring and surveying at affordable costs. Major benefits have come for vertical farming and fields with irrigation canals. However, drones have a limitation of flight time due to payload constraints. Rovers have limitations in vertical farming and obstacles like canals in agricultural fields. To meet the different requirements of multiple terrains and vertical farming in agriculture, we propose an autonomous hybrid drone-rover vehicle that combines the advantages of both rovers and drones. The prototype is described along with experimental results regarding its ability to avoid obstacles, pluck weeds and spray pesticides.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
SDLFormer: A Sparse and Dense Locality-enhanced Transformer for Accelerated MR Image Reconstruction
Authors:
Rahul G. S.,
Sriprabha Ramnarayanan,
Mohammad Al Fahim,
Keerthi Ram,
Preejith S. P,
Mohanasankar Sivaprakasam
Abstract:
Transformers have emerged as viable alternatives to convolutional neural networks owing to their ability to learn non-local region relationships in the spatial domain. The self-attention mechanism of the transformer enables transformers to capture long-range dependencies in the images, which might be desirable for accelerated MRI image reconstruction as the effect of undersampling is non-local in…
▽ More
Transformers have emerged as viable alternatives to convolutional neural networks owing to their ability to learn non-local region relationships in the spatial domain. The self-attention mechanism of the transformer enables transformers to capture long-range dependencies in the images, which might be desirable for accelerated MRI image reconstruction as the effect of undersampling is non-local in the image domain. Despite its computational efficiency, the window-based transformers suffer from restricted receptive fields as the dependencies are limited to within the scope of the image windows. We propose a window-based transformer network that integrates dilated attention mechanism and convolution for accelerated MRI image reconstruction. The proposed network consists of dilated and dense neighborhood attention transformers to enhance the distant neighborhood pixel relationship and introduce depth-wise convolutions within the transformer module to learn low-level translation invariant features for accelerated MRI image reconstruction. The proposed model is trained in a self-supervised manner. We perform extensive experiments for multi-coil MRI acceleration for coronal PD, coronal PDFS and axial T2 contrasts with 4x and 5x under-sampling in self-supervised learning based on k-space splitting. We compare our method against other reconstruction architectures and the parallel domain self-supervised learning baseline. Results show that the proposed model exhibits improvement margins of (i) around 1.40 dB in PSNR and around 0.028 in SSIM on average over other architectures (ii) around 1.44 dB in PSNR and around 0.029 in SSIM over parallel domain self-supervised learning. The code is available at https://github.com/rahul-gs-16/sdlformer.git
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
SMARLA: A Safety Monitoring Approach for Deep Reinforcement Learning Agents
Authors:
Amirhossein Zolfagharian,
Manel Abdellatif,
Lionel C. Briand,
Ramesh S
Abstract:
Deep Reinforcement Learning (DRL) has made significant advancements in various fields, such as autonomous driving, healthcare, and robotics, by enabling agents to learn optimal policies through interactions with their environments. However, the application of DRL in safety-critical domains presents challenges, particularly concerning the safety of the learned policies. DRL agents, which are focuse…
▽ More
Deep Reinforcement Learning (DRL) has made significant advancements in various fields, such as autonomous driving, healthcare, and robotics, by enabling agents to learn optimal policies through interactions with their environments. However, the application of DRL in safety-critical domains presents challenges, particularly concerning the safety of the learned policies. DRL agents, which are focused on maximizing rewards, may select unsafe actions, leading to safety violations. Runtime safety monitoring is thus essential to ensure the safe operation of these agents, especially in unpredictable and dynamic environments. This paper introduces SMARLA, a black-box safety monitoring approach specifically designed for DRL agents. SMARLA utilizes machine learning to predict safety violations by observing the agent's behavior during execution. The approach is based on Q-values, which reflect the expected reward for taking actions in specific states. SMARLA employs state abstraction to reduce the complexity of the state space, enhancing the predictive capabilities of the monitoring model. Such abstraction enables the early detection of unsafe states, allowing for the implementation of corrective and preventive measures before incidents occur. We quantitatively and qualitatively validated SMARLA on three well-known case studies widely used in DRL research. Empirical results reveal that SMARLA is accurate at predicting safety violations, with a low false positive rate, and can predict violations at an early stage, approximately halfway through the execution of the agent, before violations occur. We also discuss different decision criteria, based on confidence intervals of the predicted violation probabilities, to trigger safety mechanisms aiming at a trade-off between early detection and low false positive rates.
△ Less
Submitted 22 October, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
A Simple Robot Selection Criteria After Path Planning Using Wavefront Algorithm
Authors:
Rajashekhar V S,
Dhaya C,
Dinakar Raj C K,
Dharshan P,
Mukesh Kumar S,
Harish B,
Ajith R,
Kamaleshwaran K
Abstract:
In this work we present a technique to select the best robot for accomplishing a task assuming that the map of the environment is known in advance. To do so, capabilities of the robots are listed and the environments where they can be used are mapped. There are five robots that included for doing the tasks. They are the robotic lizard, half-humanoid, robotic snake, biped and quadruped. Each of the…
▽ More
In this work we present a technique to select the best robot for accomplishing a task assuming that the map of the environment is known in advance. To do so, capabilities of the robots are listed and the environments where they can be used are mapped. There are five robots that included for doing the tasks. They are the robotic lizard, half-humanoid, robotic snake, biped and quadruped. Each of these robots are capable of performing certain activities and also they have their own limitations. The process of considering the robot performances and acting based on their limitations is the focus of this work. The wavefront algorithm is used to find the nature of terrain. Based on the terrain a suitable robot is selected from the list of five robots by the wavefront algorithm. Using this robot the mission is accomplished.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Composite Diffusion | whole >= Σparts
Authors:
Vikram Jamwal,
Ramaneswaran S
Abstract:
For an artist or a graphic designer, the spatial layout of a scene is a critical design choice. However, existing text-to-image diffusion models provide limited support for incorporating spatial information. This paper introduces Composite Diffusion as a means for artists to generate high-quality images by composing from the sub-scenes. The artists can specify the arrangement of these sub-scenes t…
▽ More
For an artist or a graphic designer, the spatial layout of a scene is a critical design choice. However, existing text-to-image diffusion models provide limited support for incorporating spatial information. This paper introduces Composite Diffusion as a means for artists to generate high-quality images by composing from the sub-scenes. The artists can specify the arrangement of these sub-scenes through a flexible free-form segment layout. They can describe the content of each sub-scene primarily using natural text and additionally by utilizing reference images or control inputs such as line art, scribbles, human pose, canny edges, and more.
We provide a comprehensive and modular method for Composite Diffusion that enables alternative ways of generating, composing, and harmonizing sub-scenes. Further, we wish to evaluate the composite image for effectiveness in both image quality and achieving the artist's intent. We argue that existing image quality metrics lack a holistic evaluation of image composites. To address this, we propose novel quality criteria especially relevant to composite generation.
We believe that our approach provides an intuitive method of art creation. Through extensive user surveys, quantitative and qualitative analysis, we show how it achieves greater spatial, semantic, and creative control over image generation. In addition, our methods do not need to retrain or modify the architecture of the base diffusion models and can work in a plug-and-play manner with the fine-tuned models.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Winding number and circular 4-coloring of signed graphs
Authors:
Anna Gujgiczer,
Reza Naserasr,
Rohini S,
S Taruni
Abstract:
Concerning the recent notion of circular chromatic number of signed graphs, for each given integer $k$ we introduce two signed bipartite graphs, each on $2k^2-k+1$ vertices, having shortest negative cycle of length $2k$, and the circular chromatic number 4.
Each of the construction can be viewed as a bipartite analogue of the generalized Mycielski graphs on odd cycles, $M_{\ell}(C_{2k+1})$. In t…
▽ More
Concerning the recent notion of circular chromatic number of signed graphs, for each given integer $k$ we introduce two signed bipartite graphs, each on $2k^2-k+1$ vertices, having shortest negative cycle of length $2k$, and the circular chromatic number 4.
Each of the construction can be viewed as a bipartite analogue of the generalized Mycielski graphs on odd cycles, $M_{\ell}(C_{2k+1})$. In the course of proving our result, we also obtain a simple proof of the fact that $M_{\ell}(C_{2k+1})$ and some similar quadrangulations of the projective plane have circular chromatic number 4. These proofs have the advantage that they illuminate, in an elementary manner, the strong relation between algebraic topology and graph coloring problems.
△ Less
Submitted 1 March, 2024; v1 submitted 10 July, 2023;
originally announced July 2023.
-
MIMIC: Masked Image Modeling with Image Correspondences
Authors:
Kalyani Marathe,
Mahtab Bigverdi,
Nishat Khan,
Tuhin Kundu,
Patrick Howe,
Sharan Ranjit S,
Anand Bhattad,
Aniruddha Kembhavi,
Linda G. Shapiro,
Ranjay Krishna
Abstract:
Dense pixel-specific representation learning at scale has been bottlenecked due to the unavailability of large-scale multi-view datasets. Current methods for building effective pretraining datasets heavily rely on annotated 3D meshes, point clouds, and camera parameters from simulated environments, preventing them from building datasets from real-world data sources where such metadata is lacking.…
▽ More
Dense pixel-specific representation learning at scale has been bottlenecked due to the unavailability of large-scale multi-view datasets. Current methods for building effective pretraining datasets heavily rely on annotated 3D meshes, point clouds, and camera parameters from simulated environments, preventing them from building datasets from real-world data sources where such metadata is lacking. We propose a pretraining dataset-curation approach that does not require any additional annotations. Our method allows us to generate multi-view datasets from both real-world videos and simulated environments at scale. Specifically, we experiment with two scales: MIMIC-1M with 1.3M and MIMIC-3M with 3.1M multi-view image pairs. We train multiple models with different masked image modeling objectives to showcase the following findings: Representations trained on our automatically generated MIMIC-3M outperform those learned from expensive crowdsourced datasets (ImageNet-1K) and those learned from synthetic environments (MULTIVIEW-HABITAT) on two dense geometric tasks: depth estimation on NYUv2 (1.7%), and surface normals estimation on Taskonomy (2.05%). For dense tasks which also require object understanding, we outperform MULTIVIEW-HABITAT, on semantic segmentation on ADE20K (3.89%), pose estimation on MSCOCO (9.4%), and reduce the gap with models pre-trained on the object-centric expensive ImageNet-1K. We outperform even when the representations are frozen, and when downstream training data is limited to few-shot. Larger dataset (MIMIC-3M) significantly improves performance, which is promising since our curation method can arbitrarily scale to produce even larger datasets. MIMIC code, dataset, and pretrained models are open-sourced at https://github.com/RAIVNLab/MIMIC.
△ Less
Submitted 15 May, 2024; v1 submitted 26 June, 2023;
originally announced June 2023.
-
GraphVine: A Data Structure to Optimize Dynamic Graph Processing on GPUs
Authors:
Rohith Krishnan S,
Venkata Kalyan Tavva,
Rupesh Nasre
Abstract:
Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses a graph data structure for dynamic graph processing on GPUs. Unlike static graphs, dynamic graphs mutate over their lifetime through vertex and/or edge batch up…
▽ More
Graph processing on GPUs is gaining momentum due to the high throughputs observed compared to traditional CPUs, attributed to the vast number of processing cores on GPUs that can exploit parallelism in graph analytics. This paper discusses a graph data structure for dynamic graph processing on GPUs. Unlike static graphs, dynamic graphs mutate over their lifetime through vertex and/or edge batch updates. The proposed work aims to provide fast batch updates and graph querying without consuming too much GPU memory. Experimental results show improved initialization timings by 1968-1269024%, improved batch edge insert timings by 30-30047%, and improved batch edge delete timings by 50-25262% while consuming less memory when the batch size is large.
△ Less
Submitted 26 July, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
AdANNS: A Framework for Adaptive Semantic Search
Authors:
Aniket Rege,
Aditya Kusupati,
Sharan Ranjit S,
Alan Fan,
Qingqing Cao,
Sham Kakade,
Prateek Jain,
Ali Farhadi
Abstract:
Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive…
▽ More
Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.
△ Less
Submitted 18 October, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization
Authors:
Shivam Sharma,
Ramaneswaran S,
Udit Arora,
Md. Shad Akhtar,
Tanmoy Chakraborty
Abstract:
Memes are a powerful tool for communication over social media. Their affinity for evolving across politics, history, and sociocultural phenomena makes them an ideal communication vehicle. To comprehend the subtle message conveyed within a meme, one must understand the background that facilitates its holistic assimilation. Besides digital archiving of memes and their metadata by a few websites like…
▽ More
Memes are a powerful tool for communication over social media. Their affinity for evolving across politics, history, and sociocultural phenomena makes them an ideal communication vehicle. To comprehend the subtle message conveyed within a meme, one must understand the background that facilitates its holistic assimilation. Besides digital archiving of memes and their metadata by a few websites like knowyourmeme.com, currently, there is no efficient way to deduce a meme's context dynamically. In this work, we propose a novel task, MEMEX - given a meme and a related document, the aim is to mine the context that succinctly explains the background of the meme. At first, we develop MCC (Meme Context Corpus), a novel dataset for MEMEX. Further, to benchmark MCC, we propose MIME (MultImodal Meme Explainer), a multimodal neural framework that uses common sense enriched meme representation and a layered approach to capture the cross-modal semantic dependencies between the meme and the context. MIME surpasses several unimodal and multimodal systems and yields an absolute improvement of ~ 4% F1-score over the best baseline. Lastly, we conduct detailed analyses of MIME's performance, highlighting the aspects that could lead to optimal modeling of cross-modal contextual associations.
△ Less
Submitted 27 May, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Predicting Stock Market Time-Series Data using CNN-LSTM Neural Network Model
Authors:
Aadhitya A,
Rajapriya R,
Vineetha R S,
Anurag M Bagde
Abstract:
Stock market is often important as it represents the ownership claims on businesses. Without sufficient stocks, a company cannot perform well in finance. Predicting a stock market performance of a company is nearly hard because every time the prices of a company stock keeps changing and not constant. So, its complex to determine the stock data. But if the previous performance of a company in stock…
▽ More
Stock market is often important as it represents the ownership claims on businesses. Without sufficient stocks, a company cannot perform well in finance. Predicting a stock market performance of a company is nearly hard because every time the prices of a company stock keeps changing and not constant. So, its complex to determine the stock data. But if the previous performance of a company in stock market is known, then we can track the data and provide predictions to stockholders in order to wisely take decisions on handling the stocks to a company. To handle this, many machine learning models have been invented but they didn't succeed due to many reasons like absence of advanced libraries, inaccuracy of model when made to train with real time data and much more. So, to track the patterns and the features of data, a CNN-LSTM Neural Network can be made. Recently, CNN is now used in Natural Language Processing (NLP) based applications, so by identifying the features from stock data and converting them into tensors, we can obtain the features and then send it to LSTM neural network to find the patterns and thereby predicting the stock market for given period of time. The accuracy of the CNN-LSTM NN model is found to be high even when allowed to train on real-time stock market data. This paper describes about the features of the custom CNN-LSTM model, experiments we made with the model (like training with stock market datasets, performance comparison with other models) and the end product we obtained at final stage.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control
Authors:
Vishnu Rajendran S,
Bappaditya Debnath,
Bappaditya Debnath,
Sariah Mghames,
Willow Mandil,
Soran Parsa,
Simon Parsons,
Amir Ghalamzan-E
Abstract:
This paper provides an overview of the current state-of-the-art in selective harvesting robots (SHRs) and their potential for addressing the challenges of global food production. SHRs have the potential to increase productivity, reduce labour costs, and minimise food waste by selectively harvesting only ripe fruits and vegetables. The paper discusses the main components of SHRs, including percepti…
▽ More
This paper provides an overview of the current state-of-the-art in selective harvesting robots (SHRs) and their potential for addressing the challenges of global food production. SHRs have the potential to increase productivity, reduce labour costs, and minimise food waste by selectively harvesting only ripe fruits and vegetables. The paper discusses the main components of SHRs, including perception, grasping, cutting, motion planning, and control. It also highlights the challenges in developing SHR technologies, particularly in the areas of robot design, motion planning and control. The paper also discusses the potential benefits of integrating AI and soft robots and data-driven methods to enhance the performance and robustness of SHR systems. Finally, the paper identifies several open research questions in the field and highlights the need for further research and development efforts to advance SHR technologies to meet the challenges of global food production. Overall, this paper provides a starting point for researchers and practitioners interested in developing SHRs and highlights the need for more research in this field.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Control and Coordination of a SWARM of Unmanned Surface Vehicles using Deep Reinforcement Learning in ROS
Authors:
Shrudhi R S,
Sreyash Mohanty,
Susan Elias
Abstract:
An unmanned surface vehicle (USV) can perform complex missions by continuously observing the state of its surroundings and taking action toward a goal. A SWARM of USVs working together can complete missions faster, and more effectively than a single USV alone. In this paper, we propose an autonomous communication model for a swarm of USVs. The goal of this system is to implement a software system…
▽ More
An unmanned surface vehicle (USV) can perform complex missions by continuously observing the state of its surroundings and taking action toward a goal. A SWARM of USVs working together can complete missions faster, and more effectively than a single USV alone. In this paper, we propose an autonomous communication model for a swarm of USVs. The goal of this system is to implement a software system using Robot Operating System (ROS) and Gazebo. With the main objective of coordinated task completion, the Markov decision process (MDP) provides a base to formulate a task decision problem to achieve efficient localization and tracking in a highly dynamic water environment. To coordinate multiple USVs performing real-time target tracking, we propose an enhanced multi-agent reinforcement learning approach. Our proposed scheme uses MA-DDPG, or Multi-Agent Deep Deterministic Policy Gradient, an extension of the Deep Deterministic Policy Gradients (DDPG) algorithm that allows for decentralized control of multiple agents in a cooperative environment. MA-DDPG's decentralised control allows each and every agent to make decisions based on its own observations and objectives, which can lead to superior gross performance and improved stability. Additionally, it provides communication and coordination among agents through the use of collective readings and rewards.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
SerPyTor: A distributed context-aware computational graph execution framework for durable execution
Authors:
Anuran Roy,
Sridhar Raj S
Abstract:
Distributed computation is always a tricky topic to deal with, especially in context of various requirements in various scenarios. A popular solution is to use Apache Spark with a setup of multiple systems forming a cluster. However, the prerequisite setup for a Spark cluster often induces an additional overhead, often limiting usage in constrained scenarios, especially in scenarios requiring cont…
▽ More
Distributed computation is always a tricky topic to deal with, especially in context of various requirements in various scenarios. A popular solution is to use Apache Spark with a setup of multiple systems forming a cluster. However, the prerequisite setup for a Spark cluster often induces an additional overhead, often limiting usage in constrained scenarios, especially in scenarios requiring context propagation. In this paper, we explore a relatively lightweight computational graph execution framework requiring little setup and fast speeds, coupled with context awareness.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
SFT-KD-Recon: Learning a Student-friendly Teacher for Knowledge Distillation in Magnetic Resonance Image Reconstruction
Authors:
Matcha Naga Gayathri,
Sriprabha Ramanarayanan,
Mohammad Al Fahim,
Rahul G S,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Deep cascaded architectures for magnetic resonance imaging (MRI) acceleration have shown remarkable success in providing high-quality reconstruction. However, as the number of cascades increases, the improvements in reconstruction tend to become marginal, indicating possible excess model capacity. Knowledge distillation (KD) is an emerging technique to compress these models, in which a trained dee…
▽ More
Deep cascaded architectures for magnetic resonance imaging (MRI) acceleration have shown remarkable success in providing high-quality reconstruction. However, as the number of cascades increases, the improvements in reconstruction tend to become marginal, indicating possible excess model capacity. Knowledge distillation (KD) is an emerging technique to compress these models, in which a trained deep teacher network is used to distill knowledge to a smaller student network such that the student learns to mimic the behavior of the teacher. Most KD methods focus on effectively training the student with a pre-trained teacher unaware of the student model. We propose SFT-KD-Recon, a student-friendly teacher training approach along with the student as a prior step to KD to make the teacher aware of the structure and capacity of the student and enable aligning the representations of the teacher with the student. In SFT, the teacher is jointly trained with the unfolded branch configurations of the student blocks using three loss terms - teacher-reconstruction loss, student-reconstruction loss, and teacher-student imitation loss, followed by KD of the student. We perform extensive experiments for MRI acceleration in 4x and 5x under-sampling on the brain and cardiac datasets on five KD methods using the proposed approach as a prior step. We consider the DC-CNN architecture and setup teacher as D5C5 (141765 parameters), and student as D3C5 (49285 parameters), denoting a compression of 2.87:1. Results show that (i) our approach consistently improves the KD methods with improved reconstruction performance and image quality, and (ii) the student distilled using our approach is competitive with the teacher, with the performance gap reduced from 0.53 dB to 0.03 dB.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Acoustic Soft Tactile Skin (AST Skin)
Authors:
Vishnu Rajendran S,
Willow Mandil,
Simon Parsons,
Amir Ghalamzan E
Abstract:
This paper presents a novel soft tactile skin (STS) technology operating with sound waves. In this innovative approach, the sound waves generated by a speaker travel in channels embedded in a soft membrane and get modulated due to a deformation of the channel when pressed by an external force and received by a microphone at the end of the channel. The sensor leverages regression and classification…
▽ More
This paper presents a novel soft tactile skin (STS) technology operating with sound waves. In this innovative approach, the sound waves generated by a speaker travel in channels embedded in a soft membrane and get modulated due to a deformation of the channel when pressed by an external force and received by a microphone at the end of the channel. The sensor leverages regression and classification methods for estimating the normal force and its contact location. Our sensor can be affixed to any robot part, e.g., end effectors or arm. We tested several regression and classifier methods to learn the relation between sound wave modulation, the applied force, and its location, respectively and picked the best-performing models for force and location predictions. Our novel tactile sensor yields 93% of the force estimation within 1.5 N tolerances for a range of 0-30+1 N and estimates contact locations with over 96% accuracy. We also demonstrated the performance of STS technology for a real-time gripping force control application.
△ Less
Submitted 29 February, 2024; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Unified Software Design Patterns for Simulated Annealing
Authors:
Rohit Goswami,
Ruhila S.,
Amrita Goswami,
Sonaly Goswami,
Debabrata Goswami
Abstract:
Any optimization algorithm programming interface can be seen as a black-box function with additional free parameters. In this spirit, simulated annealing (SA) can be implemented in pseudo-code within the dimensions of a single slide with free parameters relating to the annealing schedule. Such an implementation, however, necessarily neglects much of the structure necessary to take advantage of adv…
▽ More
Any optimization algorithm programming interface can be seen as a black-box function with additional free parameters. In this spirit, simulated annealing (SA) can be implemented in pseudo-code within the dimensions of a single slide with free parameters relating to the annealing schedule. Such an implementation, however, necessarily neglects much of the structure necessary to take advantage of advances in computing resources and algorithmic breakthroughs. Simulated annealing is often introduced in myriad disciplines, from discrete examples like the Traveling Salesman Problem (TSP) to molecular cluster potential energy exploration or even explorations of a protein's configurational space. Theoretical guarantees also demand a stricter structure in terms of statistical quantities, which cannot simply be left to the user. We will introduce several standard paradigms and demonstrate how these can be "lifted" into a unified framework using object-oriented programming in Python. We demonstrate how clean, interoperable, reproducible programming libraries can be used to access and rapidly iterate on variants of Simulated Annealing in a manner which can be extended to serve as a best practices blueprint or design pattern for a data-driven optimization library.
△ Less
Submitted 23 February, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Predictive Barrier Lyapunov Function Based Control for Safe Trajectory Tracking of an Aerial Manipulator
Authors:
Vedant Mundheda,
Karan Mirakhor,
Rahul K S,
Harikumar Kandath,
Nagamanikandan Govindan
Abstract:
This paper proposes a novel controller framework that provides trajectory tracking for an Aerial Manipulator (AM) while ensuring the safe operation of the system under unknown bounded disturbances. The AM considered here is a 2-DOF (degrees-of-freedom) manipulator rigidly attached to a UAV. Our proposed controller structure follows the conventional inner loop PID control for attitude dynamics and…
▽ More
This paper proposes a novel controller framework that provides trajectory tracking for an Aerial Manipulator (AM) while ensuring the safe operation of the system under unknown bounded disturbances. The AM considered here is a 2-DOF (degrees-of-freedom) manipulator rigidly attached to a UAV. Our proposed controller structure follows the conventional inner loop PID control for attitude dynamics and an outer loop controller for tracking a reference trajectory. The outer loop control is based on the Model Predictive Control (MPC) with constraints derived using the Barrier Lyapunov Function (BLF) for the safe operation of the AM. BLF-based constraints are proposed for two objectives, viz. 1) To avoid the AM from colliding with static obstacles like a rectangular wall, and 2) To maintain the end effector of the manipulator within the desired workspace. The proposed BLF ensures that the above-mentioned objectives are satisfied even in the presence of unknown bounded disturbances. The capabilities of the proposed controller are demonstrated through high-fidelity non-linear simulations with parameters derived from a real laboratory scale AM. We compare the performance of our controller with other state-of-the-art MPC controllers for AM.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Machine Learning Algorithms for Time Series Analysis and Forecasting
Authors:
Rameshwar Garg,
Shriya Barpanda,
Girish Rao Salanke N S,
Ramya S
Abstract:
Time series data is being used everywhere, from sales records to patients' health evolution metrics. The ability to deal with this data has become a necessity, and time series analysis and forecasting are used for the same. Every Machine Learning enthusiast would consider these as very important tools, as they deepen the understanding of the characteristics of data. Forecasting is used to predict…
▽ More
Time series data is being used everywhere, from sales records to patients' health evolution metrics. The ability to deal with this data has become a necessity, and time series analysis and forecasting are used for the same. Every Machine Learning enthusiast would consider these as very important tools, as they deepen the understanding of the characteristics of data. Forecasting is used to predict the value of a variable in the future, based on its past occurrences. A detailed survey of the various methods that are used for forecasting has been presented in this paper. The complete process of forecasting, from preprocessing to validation has also been explained thoroughly. Various statistical and deep learning models have been considered, notably, ARIMA, Prophet and LSTMs. Hybrid versions of Machine Learning models have also been explored and elucidated. Our work can be used by anyone to develop a good understanding of the forecasting process, and to identify various state of the art models which are being used today.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Butterflies: A new source of inspiration for futuristic aerial robotics
Authors:
Chakravarthi Jada,
Lokesh Ch. R. S,
Ashok Urlana,
Shridi Swamy Yerubandi,
Kantha Rao Bora,
Gouse Basha Shaik,
Pavan Baswani,
Balaraju Karri
Abstract:
Nature is an inhabitant for enormous number of species. All the species do perform complex activities with simple and elegant rules for their survival. The property of emergence of collective behavior is remarkably supporting their activities. One form of the collective behaviour is the swarm intelligence -- all agents poses same rules and capabilities. This equality along with local cooperation i…
▽ More
Nature is an inhabitant for enormous number of species. All the species do perform complex activities with simple and elegant rules for their survival. The property of emergence of collective behavior is remarkably supporting their activities. One form of the collective behaviour is the swarm intelligence -- all agents poses same rules and capabilities. This equality along with local cooperation in the agents tremendously leads to achieving global results. Some of the swarm behaviours in the nature includes birds formations , fish school maneuverings, ants movement. Recently, one school of research has studied these behaviours and proposed artificial paradigms such as Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Glowworm Swarm Optimization (GSO) etc. Another school of research used these models and designed robotic platforms to detect (locate) multiple signal sources such as light, fire, plume, odour etc. Kinbots platform is one such recent experiment. In the same line of thought, this extended abstract presents the recently proposed butterfly inspired metaphor and corresponding simulations, ongoing experiments with outcomes.
△ Less
Submitted 24 August, 2022;
originally announced September 2022.
-
Multi-agent reinforcement learning for intent-based service assurance in cellular networks
Authors:
Satheesh K. Perepu,
Jean P. Martins,
Ricardo Souza S,
Kaushik Dey
Abstract:
Recently, intent-based management has received good attention in telecom networks owing to stringent performance requirements for many of the use cases. Several approaches in the literature employ traditional closed-loop driven methods to fulfill the intents on the KPIs. However, these methods consider every closed-loop independent of each other which degrades the combined performance. Also, such…
▽ More
Recently, intent-based management has received good attention in telecom networks owing to stringent performance requirements for many of the use cases. Several approaches in the literature employ traditional closed-loop driven methods to fulfill the intents on the KPIs. However, these methods consider every closed-loop independent of each other which degrades the combined performance. Also, such existing methods are not easily scalable. Multi-agent reinforcement learning (MARL) techniques have shown significant promise in many areas in which traditional closed-loop control falls short, typically for complex coordination and conflict management among loops. In this work, we propose a method based on MARL to achieve intent-based management without the need for knowing a model of the underlying system. Moreover, when there are conflicting intents, the MARL agents can implicitly incentivize the loops to cooperate and promote trade-offs, without human interaction, by prioritizing the important KPIs. Experiments have been performed on a network emulator for optimizing KPIs of three services. Results obtained demonstrate that the proposed system performs quite well and is able to fulfill all existing intents when there are enough resources or prioritize the KPIs when resources are scarce.
△ Less
Submitted 26 August, 2022; v1 submitted 7 August, 2022;
originally announced August 2022.