Search | arXiv e-print repository

From Plain Text to Poetic Form: Generating Metrically-Constrained Sanskrit Verses

Authors: Manoj Balaji Jagadeeshan, Samarth Bhatia, Pretam Ray, Harshul Raj Surana, Akhil Rajeev P, Priya Mishra, Annarao Kulkarni, Ganesh Ramakrishnan, Prathosh AP, Pawan Goyal

Abstract: Recent advances in large language models (LLMs) have significantly improved natural language generation, including creative tasks like poetry composition. However, most progress remains concentrated in high-resource languages. This raises an important question: Can LLMs be adapted for structured poetic generation in a low-resource, morphologically rich language such as Sanskrit? In this work, we i… ▽ More Recent advances in large language models (LLMs) have significantly improved natural language generation, including creative tasks like poetry composition. However, most progress remains concentrated in high-resource languages. This raises an important question: Can LLMs be adapted for structured poetic generation in a low-resource, morphologically rich language such as Sanskrit? In this work, we introduce a dataset designed for translating English prose into structured Sanskrit verse, with strict adherence to classical metrical patterns, particularly the Anushtub meter. We evaluate a range of generative models-both open-source and proprietary-under multiple settings. Specifically, we explore constrained decoding strategies and instruction-based fine-tuning tailored to metrical and semantic fidelity. Our decoding approach achieves over 99% accuracy in producing syntactically valid poetic forms, substantially outperforming general-purpose models in meter conformity. Meanwhile, instruction-tuned variants show improved alignment with source meaning and poetic style, as supported by human assessments, albeit with marginal trade-offs in metrical precision. △ Less

Submitted 31 May, 2025; originally announced June 2025.

arXiv:2505.19659 [pdf, ps, other]

LangDAug: Langevin Data Augmentation for Multi-Source Domain Generalization in Medical Image Segmentation

Authors: Piyush Tiwary, Kinjawl Bhattacharyya, Prathosh A. P

Abstract: Medical image segmentation models often struggle to generalize across different domains due to various reasons. Domain Generalization (DG) methods overcome this either through representation learning or data augmentation (DAug). While representation learning methods seek domain-invariant features, they often rely on ad-hoc techniques and lack formal guarantees. DAug methods, which enrich model rep… ▽ More Medical image segmentation models often struggle to generalize across different domains due to various reasons. Domain Generalization (DG) methods overcome this either through representation learning or data augmentation (DAug). While representation learning methods seek domain-invariant features, they often rely on ad-hoc techniques and lack formal guarantees. DAug methods, which enrich model representations through synthetic samples, have shown comparable or superior performance to representation learning approaches. We propose LangDAug, a novel $\textbf{Lang}$evin $\textbf{D}$ata $\textbf{Aug}$mentation for multi-source domain generalization in 2D medical image segmentation. LangDAug leverages Energy-Based Models (EBMs) trained via contrastive divergence to traverse between source domains, generating intermediate samples through Langevin dynamics. Theoretical analysis shows that LangDAug induces a regularization effect, and for GLMs, it upper-bounds the Rademacher complexity by the intrinsic dimensionality of the data manifold. Through extensive experiments on Fundus segmentation and 2D MRI prostate segmentation benchmarks, we show that LangDAug outperforms state-of-the-art domain generalization methods and effectively complements existing domain-randomization approaches. The codebase for our method is available at https://github.com/backpropagator/LangDAug. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: Accepted at ICML 2025

arXiv:2505.19105 [pdf, ps, other]

Latent Mamba Operator for Partial Differential Equations

Authors: Karn Tiwari, Niladri Dutta, N M Anoop Krishnan, Prathosh A P

Abstract: Neural operators have emerged as powerful data-driven frameworks for solving Partial Differential Equations (PDEs), offering significant speedups over numerical methods. However, existing neural operators struggle with scalability in high-dimensional spaces, incur high computational costs, and face challenges in capturing continuous and long-range dependencies in PDE dynamics. To address these lim… ▽ More Neural operators have emerged as powerful data-driven frameworks for solving Partial Differential Equations (PDEs), offering significant speedups over numerical methods. However, existing neural operators struggle with scalability in high-dimensional spaces, incur high computational costs, and face challenges in capturing continuous and long-range dependencies in PDE dynamics. To address these limitations, we introduce the Latent Mamba Operator (LaMO), which integrates the efficiency of state-space models (SSMs) in latent space with the expressive power of kernel integral formulations in neural operators. We also establish a theoretical connection between state-space models (SSMs) and the kernel integral of neural operators. Extensive experiments across diverse PDE benchmarks on regular grids, structured meshes, and point clouds covering solid and fluid physics datasets, LaMOs achieve consistent state-of-the-art (SOTA) performance, with a 32.3% improvement over existing baselines in solution operator approximation, highlighting its efficacy in modeling complex PDE solutions. △ Less

Submitted 28 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

arXiv:2504.19264 [pdf, ps, other]

Navigating AI Policy Landscapes: Insights into Human Rights Considerations Across IEEE Regions

Authors: Angel Mary John, Jerrin Thomas Panachakel, Anusha S. P

Abstract: This paper explores the integration of human rights considerations into AI regulatory frameworks across different IEEE regions - specifically the United States (Region 1-6), Europe (Region 8), China (part of Region 10), and Singapore (part of Region 10). While all acknowledge the transformative potential of AI and the necessity of ethical guidelines, their regulatory approaches significantly diffe… ▽ More This paper explores the integration of human rights considerations into AI regulatory frameworks across different IEEE regions - specifically the United States (Region 1-6), Europe (Region 8), China (part of Region 10), and Singapore (part of Region 10). While all acknowledge the transformative potential of AI and the necessity of ethical guidelines, their regulatory approaches significantly differ. Europe exhibits a rigorous framework with stringent protections for individual rights, while the U.S. promotes innovation with less restrictive regulations. China emphasizes state control and societal order in its AI strategies. In contrast, Singapore's advisory framework encourages self-regulation and aligns closely with international norms. This comparative analysis underlines the need for ongoing global dialogue to harmonize AI regulations that safeguard human rights while promoting technological advancement, reflecting the diverse perspectives and priorities of each region. △ Less

Submitted 27 April, 2025; originally announced April 2025.

Comments: 2024 IEEE 12th Region 10 Humanitarian Technology Conference (R10-HTC). IEEE, 2024

arXiv:2504.04954 [pdf, other]

GOTHAM: Graph Class Incremental Learning Framework under Weak Supervision

Authors: Aditya Hemant Shahane, Prathosh A. P, Sandeep Kumar

Abstract: Graphs are growing rapidly, along with the number of distinct label categories associated with them. Applications like e-commerce, healthcare, recommendation systems, and various social media platforms are rapidly moving towards graph representation of data due to their ability to capture both structural and attribute information. One crucial task in graph analysis is node classification, where un… ▽ More Graphs are growing rapidly, along with the number of distinct label categories associated with them. Applications like e-commerce, healthcare, recommendation systems, and various social media platforms are rapidly moving towards graph representation of data due to their ability to capture both structural and attribute information. One crucial task in graph analysis is node classification, where unlabeled nodes are categorized into predefined classes. In practice, novel classes appear incrementally sometimes with just a few labels (seen classes) or even without any labels (unseen classes), either because they are new or haven't been explored much. Traditional methods assume abundant labeled data for training, which isn't always feasible. We investigate a broader objective: \emph{Graph Class Incremental Learning under Weak Supervision (GCL)}, addressing this challenge by meta-training on base classes with limited labeled instances. During the incremental streams, novel classes can have few-shot or zero-shot representation. Our proposed framework GOTHAM efficiently accommodates these unlabeled nodes by finding the closest prototype representation, serving as class representatives in the attribute space. For Text-Attributed Graphs (TAGs), our framework additionally incorporates semantic information to enhance the representation. By employing teacher-student knowledge distillation to mitigate forgetting, GOTHAM achieves promising results across various tasks. Experiments on datasets such as Cora-ML, Amazon, and OBGN-Arxiv showcase the effectiveness of our approach in handling evolving graph data under limited supervision. The repository is available here: \href{https://github.com/adityashahane10/GOTHAM--Graph-based-Class-Incremental-Learning-Framework-under-Weak-Supervision}{\small \textcolor{blue}{Code}} △ Less

Submitted 7 April, 2025; originally announced April 2025.

arXiv:2503.05718 [pdf, other]

zScore: A Universal Decentralised Reputation System for the Blockchain Economy

Authors: Himanshu Udupi, Ashutosh Sahoo, Akshay S. P., Gurukiran S., Parag Paul, Petrus C. Martens

Abstract: Modern society functions on trust. The onchain economy, however, is built on the founding principles of trustless peer-to-peer interactions in an adversarial environment without a centralised body of trust and needs a verifiable system to quantify credibility to minimise bad economic activity. We provide a robust framework titled zScore, a core primitive for reputation derived from a wallet's onch… ▽ More Modern society functions on trust. The onchain economy, however, is built on the founding principles of trustless peer-to-peer interactions in an adversarial environment without a centralised body of trust and needs a verifiable system to quantify credibility to minimise bad economic activity. We provide a robust framework titled zScore, a core primitive for reputation derived from a wallet's onchain behaviour using state-of-the-art AI neural network models combined with real-world credentials ported onchain through zkTLS. The initial results tested on retroactive data from lending protocols establish a strong correlation between a good zScore and healthy borrowing and repayment behaviour, making it a robust and decentralised alibi for creditworthiness; we highlight significant improvements from previous attempts by protocols like Cred showcasing its robustness. We also present a list of possible applications of our system in Section 5, thereby establishing its utility in rewarding actual value creation while filtering noise and suspicious activity and flagging malicious behaviour by bad actors. △ Less

Submitted 17 February, 2025; originally announced March 2025.

ACM Class: K.4.4; I.2.11; C.2.4; K.4.2; H.3.5

arXiv:2502.00833 [pdf, other]

Cross multiscale vision transformer for deep fake detection

Authors: Akhshan P, Taneti Sanjay, Chandrakala S

Abstract: The proliferation of deep fake technology poses significant challenges to digital media authenticity, necessitating robust detection mechanisms. This project evaluates deep fake detection using the SP Cup's 2025 deep fake detection challenge dataset. We focused on exploring various deep learning models for detecting deep fake content, utilizing traditional deep learning techniques alongside newer… ▽ More The proliferation of deep fake technology poses significant challenges to digital media authenticity, necessitating robust detection mechanisms. This project evaluates deep fake detection using the SP Cup's 2025 deep fake detection challenge dataset. We focused on exploring various deep learning models for detecting deep fake content, utilizing traditional deep learning techniques alongside newer architectures. Our approach involved training a series of models and rigorously assessing their performance using metrics such as accuracy. △ Less

Submitted 2 February, 2025; originally announced February 2025.

arXiv:2501.06007 [pdf]

CoNOAir: A Neural Operator for Forecasting Carbon Monoxide Evolution in Cities

Authors: Sanchit Bedi, Karn Tiwari, Prathosh A. P., Sri Harsha Kota, N. M. Anoop Krishnan

Abstract: Carbon Monoxide (CO) is a dominant pollutant in urban areas due to the energy generation from fossil fuels for industry, automobile, and domestic requirements. Forecasting the evolution of CO in real-time can enable the deployment of effective early warning systems and intervention strategies. However, the computational cost associated with the physics and chemistry-based simulation makes it prohi… ▽ More Carbon Monoxide (CO) is a dominant pollutant in urban areas due to the energy generation from fossil fuels for industry, automobile, and domestic requirements. Forecasting the evolution of CO in real-time can enable the deployment of effective early warning systems and intervention strategies. However, the computational cost associated with the physics and chemistry-based simulation makes it prohibitive to implement such a model at the city and country scale. To address this challenge, here, we present a machine learning model based on neural operator, namely, Complex Neural Operator for Air Quality (CoNOAir), that can effectively forecast CO concentrations. We demonstrate this by developing a country-level model for short-term (hourly) and long-term (72-hour) forecasts of CO concentrations. Our model outperforms state-of-the-art models such as Fourier neural operators (FNO) and provides reliable predictions for both short and long-term forecasts. We further analyse the capability of the model to capture extreme events and generate forecasts in urban cities in India. Interestingly, we observe that the model predicts the next hour CO concentrations with R2 values greater than 0.95 for all the cities considered. The deployment of such a model can greatly assist the governing bodies to provide early warning, plan intervention strategies, and develop effective strategies by considering several what-if scenarios. Altogether, the present approach could provide a fillip to real-time predictions of CO pollution in urban cities. △ Less

Submitted 13 January, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

Comments: 28 pages, 14 figures, under submission process

arXiv:2412.13555 [pdf, other]

CPPJoules: An Energy Measurement Tool for C++

Authors: Shivadharshan S, Akilesh P, Rajrupa Chattaraj, Sridhar Chimalakonda

Abstract: With the increasing complexity of modern software and the demand for high performance, energy consumption has become a critical factor for developers and researchers. While much of the research community is focused on evaluating the energy consumption of machine learning and artificial intelligence systems -- often implemented in Python -- there is a gap when it comes to tools and frameworks for m… ▽ More With the increasing complexity of modern software and the demand for high performance, energy consumption has become a critical factor for developers and researchers. While much of the research community is focused on evaluating the energy consumption of machine learning and artificial intelligence systems -- often implemented in Python -- there is a gap when it comes to tools and frameworks for measuring energy usage in other programming languages. C++, in particular, remains a foundational language for a wide range of software applications, from game development to parallel programming frameworks, yet lacks dedicated energy measurement solutions. To address this, we have developed CPPJoules, a tool built on top of Intel-RAPL to measure the energy consumption of C++ code snippets. We have evaluated the tool by measuring the energy consumption of the standard computational tasks from the Rosetta Code repository. The demonstration of the tool is available at \url{https://www.youtube.com/watch?v=GZXYF3AKzPk} and related artifacts at \url{https://rishalab.github.io/CPPJoules/}. △ Less

Submitted 18 December, 2024; originally announced December 2024.

arXiv:2412.06089 [pdf, other]

GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis

Authors: Ashish Goswami, Satyam Kumar Modi, Santhosh Rishi Deshineni, Harman Singh, Prathosh A. P, Parag Singla

Abstract: Text-to-image (T2I) generation has seen significant progress with diffusion models, enabling generation of photo-realistic images from text prompts. Despite this progress, existing methods still face challenges in following complex text prompts, especially those requiring compositional and multi-step reasoning. Given such complex instructions, SOTA models often make mistakes in faithfully modeling… ▽ More Text-to-image (T2I) generation has seen significant progress with diffusion models, enabling generation of photo-realistic images from text prompts. Despite this progress, existing methods still face challenges in following complex text prompts, especially those requiring compositional and multi-step reasoning. Given such complex instructions, SOTA models often make mistakes in faithfully modeling object attributes, and relationships among them. In this work, we present an alternate paradigm for T2I synthesis, decomposing the task of complex multi-step generation into three steps, (a) Generate: we first generate an image using existing diffusion models (b) Plan: we make use of Multi-Modal LLMs (MLLMs) to identify the mistakes in the generated image expressed in terms of individual objects and their properties, and produce a sequence of corrective steps required in the form of an edit-plan. (c) Edit: we make use of an existing text-guided image editing models to sequentially execute our edit-plan over the generated image to get the desired image which is faithful to the original instruction. Our approach derives its strength from the fact that it is modular in nature, is training free, and can be applied over any combination of image generation and editing models. As an added contribution, we also develop a model capable of compositional editing, which further helps improve the overall accuracy of our proposed approach. Our method flexibly trades inference time compute with performance on compositional text prompts. We perform extensive experimental evaluation across 3 benchmarks and 10 T2I models including DALLE-3 and the latest -- SD-3.5-Large. Our approach not only improves the performance of the SOTA models, by upto 3 points, it also reduces the performance gap between weaker and stronger models. $\href{https://dair-iitd.github.io/GraPE/}{https://dair-iitd.github.io/GraPE/}$ △ Less

Submitted 11 March, 2025; v1 submitted 8 December, 2024; originally announced December 2024.

arXiv:2411.05886 [pdf, other]

UnDIVE: Generalized Underwater Video Enhancement Using Generative Priors

Authors: Suhas Srinath, Aditya Chandrasekar, Hemang Jamadagni, Rajiv Soundararajan, Prathosh A P

Abstract: With the rise of marine exploration, underwater imaging has gained significant attention as a research topic. Underwater video enhancement has become crucial for real-time computer vision tasks in marine exploration. However, most existing methods focus on enhancing individual frames and neglect video temporal dynamics, leading to visually poor enhancements. Furthermore, the lack of ground-truth r… ▽ More With the rise of marine exploration, underwater imaging has gained significant attention as a research topic. Underwater video enhancement has become crucial for real-time computer vision tasks in marine exploration. However, most existing methods focus on enhancing individual frames and neglect video temporal dynamics, leading to visually poor enhancements. Furthermore, the lack of ground-truth references limits the use of abundant available underwater video data in many applications. To address these issues, we propose a two-stage framework for enhancing underwater videos. The first stage uses a denoising diffusion probabilistic model to learn a generative prior from unlabeled data, capturing robust and descriptive feature representations. In the second stage, this prior is incorporated into a physics-based image formulation for spatial enhancement, while also enforcing temporal consistency between video frames. Our method enables real-time and computationally-efficient processing of high-resolution underwater videos at lower resolutions, and offers efficient enhancement in the presence of diverse water-types. Extensive experiments on four datasets show that our approach generalizes well and outperforms existing enhancement methods. Our code is available at github.com/suhas-srinath/undive. △ Less

Submitted 8 November, 2024; originally announced November 2024.

Comments: Accepted to IEEE/CVF WACV 2025

arXiv:2409.14217 [pdf, other]

doi 10.1145/3640457.3688073

Revisiting BPR: A Replicability Study of a Common Recommender System Baseline

Authors: Aleksandr Milogradskii, Oleg Lashinin, Alexander P, Marina Ananyeva, Sergey Kolesnikov

Abstract: Bayesian Personalized Ranking (BPR), a collaborative filtering approach based on matrix factorization, frequently serves as a benchmark for recommender systems research. However, numerous studies often overlook the nuances of BPR implementation, claiming that it performs worse than newly proposed methods across various tasks. In this paper, we thoroughly examine the features of the BPR model, indi… ▽ More Bayesian Personalized Ranking (BPR), a collaborative filtering approach based on matrix factorization, frequently serves as a benchmark for recommender systems research. However, numerous studies often overlook the nuances of BPR implementation, claiming that it performs worse than newly proposed methods across various tasks. In this paper, we thoroughly examine the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. Furthermore, through extensive experiments on real-world datasets under modern evaluation settings, we demonstrate that with proper tuning of its hyperparameters, the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets. Specifically, on the Million Song Dataset, the BPR model with hyperparameters tuning statistically significantly outperforms Mult-VAE by 10% in NDCG@100 with binary relevance function. △ Less

Submitted 18 October, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

Comments: This paper is accepted at the Reproducibility track of the ACM RecSys '24 conference

arXiv:2407.20383 [pdf]

Appraisal-Guided Proximal Policy Optimization: Modeling Psychological Disorders in Dynamic Grid World

Authors: Hari Prasad, Chinnu Jacob, Imthias Ahamed T. P

Abstract: The integration of artificial intelligence across multiple domains has emphasized the importance of replicating human-like cognitive processes in AI. By incorporating emotional intelligence into AI agents, their emotional stability can be evaluated to enhance their resilience and dependability in critical decision-making tasks. In this work, we develop a methodology for modeling psychological diso… ▽ More The integration of artificial intelligence across multiple domains has emphasized the importance of replicating human-like cognitive processes in AI. By incorporating emotional intelligence into AI agents, their emotional stability can be evaluated to enhance their resilience and dependability in critical decision-making tasks. In this work, we develop a methodology for modeling psychological disorders using Reinforcement Learning (RL) agents. We utilized Appraisal theory to train RL agents in a dynamic grid world environment with an Appraisal-Guided Proximal Policy Optimization (AG-PPO) algorithm. Additionally, we investigated numerous reward-shaping strategies to simulate psychological disorders and regulate the behavior of the agents. A comparison of various configurations of the modified PPO algorithm identified variants that simulate Anxiety disorder and Obsessive-Compulsive Disorder (OCD)-like behavior in agents. Furthermore, we compared standard PPO with AG-PPO and its configurations, highlighting the performance improvement in terms of generalization capabilities. Finally, we conducted an analysis of the agents' behavioral patterns in complex test environments to evaluate the associated symptoms corresponding to the psychological disorders. Overall, our work showcases the benefits of the appraisal-guided PPO algorithm over the standard PPO algorithm and the potential to simulate psychological disorders in a controlled artificial environment and evaluate them on RL agents. △ Less

Submitted 29 July, 2024; originally announced July 2024.

ACM Class: I.2.0

arXiv:2406.06027 [pdf, other]

HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

Authors: Pranoy Panda, Ankush Agarwal, Chaitanya Devaguptapu, Manohar Kaul, Prathosh A P

Abstract: Given unstructured text, Large Language Models (LLMs) are adept at answering simple (single-hop) questions. However, as the complexity of the questions increase, the performance of LLMs degrade. We believe this is due to the overhead associated with understanding the complex question followed by filtering and aggregating unstructured information in the raw text. Recent methods try to reduce this b… ▽ More Given unstructured text, Large Language Models (LLMs) are adept at answering simple (single-hop) questions. However, as the complexity of the questions increase, the performance of LLMs degrade. We believe this is due to the overhead associated with understanding the complex question followed by filtering and aggregating unstructured information in the raw text. Recent methods try to reduce this burden by integrating structured knowledge triples into the raw text, aiming to provide a structured overview that simplifies information processing. However, this simplistic approach is query-agnostic and the extracted facts are ambiguous as they lack context. To address these drawbacks and to enable LLMs to answer complex (multi-hop) questions with ease, we propose to use a knowledge graph (KG) that is context-aware and is distilled to contain query-relevant information. The use of our compressed distilled KG as input to the LLM results in our method utilizing up to $67\%$ fewer tokens to represent the query relevant information present in the supporting documents, compared to the state-of-the-art (SoTA) method. Our experiments show consistent improvements over the SoTA across several metrics (EM, F1, BERTScore, and Human Eval) on two popular benchmark datasets (HotpotQA and MuSiQue). △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted at ACL 2024 in the main track

arXiv:2404.05763 [pdf]

Deep Learning-Based Brain Image Segmentation for Automated Tumour Detection

Authors: Suman Sourabh, Murugappan Valliappan, Narayana Darapaneni, Anwesh R P

Abstract: Introduction: The present study on the development and evaluation of an automated brain tumor segmentation technique based on deep learning using the 3D U-Net model. Objectives: The objective is to leverage state-of-the-art convolutional neural networks (CNNs) on a large dataset of brain MRI scans for segmentation. Methods: The proposed methodology applies pre-processing techniques for enhanced pe… ▽ More Introduction: The present study on the development and evaluation of an automated brain tumor segmentation technique based on deep learning using the 3D U-Net model. Objectives: The objective is to leverage state-of-the-art convolutional neural networks (CNNs) on a large dataset of brain MRI scans for segmentation. Methods: The proposed methodology applies pre-processing techniques for enhanced performance and generalizability. Results: Extensive validation on an independent dataset confirms the model's robustness and potential for integration into clinical workflows. The study emphasizes the importance of data pre-processing and explores various hyperparameters to optimize the model's performance. The 3D U-Net, has given IoUs for training and validation dataset have been 0.8181 and 0.66 respectively. Conclusion: Ultimately, this comprehensive framework showcases the efficacy of deep learning in automating brain tumour detection, offering valuable support in clinical practice. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04654 [pdf]

Music Recommendation Based on Facial Emotion Recognition

Authors: Rajesh B, Keerthana V, Narayana Darapaneni, Anwesh Reddy P

Abstract: Introduction: Music provides an incredible avenue for individuals to express their thoughts and emotions, while also serving as a delightful mode of entertainment for enthusiasts and music lovers. Objectives: This paper presents a comprehensive approach to enhancing the user experience through the integration of emotion recognition, music recommendation, and explainable AI using GRAD-CAM. Methods:… ▽ More Introduction: Music provides an incredible avenue for individuals to express their thoughts and emotions, while also serving as a delightful mode of entertainment for enthusiasts and music lovers. Objectives: This paper presents a comprehensive approach to enhancing the user experience through the integration of emotion recognition, music recommendation, and explainable AI using GRAD-CAM. Methods: The proposed methodology utilizes a ResNet50 model trained on the Facial Expression Recognition (FER) dataset, consisting of real images of individuals expressing various emotions. Results: The system achieves an accuracy of 82% in emotion classification. By leveraging GRAD-CAM, the model provides explanations for its predictions, allowing users to understand the reasoning behind the system's recommendations. The model is trained on both FER and real user datasets, which include labelled facial expressions, and real images of individuals expressing various emotions. The training process involves pre-processing the input images, extracting features through convolutional layers, reasoning with dense layers, and generating emotion predictions through the output layer. Conclusion: The proposed methodology, leveraging the Resnet50 model with ROI-based analysis and explainable AI techniques, offers a robust and interpretable solution for facial emotion detection paper. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04635 [pdf]

A Deep Look Into -- Automated Lung X-Ray Abnormality Detection System

Authors: Nagullas KS, Vivekanand. V, Narayana Darapaneni, Anwesh R P

Abstract: Introduction: Automated Lung X-Ray Abnormality Detection System is the application which distinguish the normal x-ray images from infected x-ray images and highlight area considered for prediction, with the recent pandemic a need to have a non-conventional method and faster detecting diseases, for which X ray serves the purpose. Obectives: As of current situation any viral disease that is infectio… ▽ More Introduction: Automated Lung X-Ray Abnormality Detection System is the application which distinguish the normal x-ray images from infected x-ray images and highlight area considered for prediction, with the recent pandemic a need to have a non-conventional method and faster detecting diseases, for which X ray serves the purpose. Obectives: As of current situation any viral disease that is infectious is potential pandemic, so there is need for cheap and early detection system. Methods: This research will help to eases the work of expert to do further analysis. Accuracy of three different preexisting models such as DenseNet, MobileNet and VGG16 were high but models over-fitted primarily due to black and white images. Results: This led to building up new method such as as V-BreathNet which gave more than 96% percent accuracy. Conclusion: Thus, it can be stated that not all state-of art CNN models can be used on B/W images. In conclusion not all state-of-art CNN models can be used on B/W images. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2403.16246 [pdf, other]

Partially Blinded Unlearning: Class Unlearning for Deep Networks a Bayesian Perspective

Authors: Subhodip Panda, Shashwat Sourav, Prathosh A. P

Abstract: In order to adhere to regulatory standards governing individual data privacy and safety, machine learning models must systematically eliminate information derived from specific subsets of a user's training data that can no longer be utilized. The emerging discipline of Machine Unlearning has arisen as a pivotal area of research, facilitating the process of selectively discarding information design… ▽ More In order to adhere to regulatory standards governing individual data privacy and safety, machine learning models must systematically eliminate information derived from specific subsets of a user's training data that can no longer be utilized. The emerging discipline of Machine Unlearning has arisen as a pivotal area of research, facilitating the process of selectively discarding information designated to specific sets or classes of data from a pre-trained model, thereby eliminating the necessity for extensive retraining from scratch. The principal aim of this study is to formulate a methodology tailored for the purposeful elimination of information linked to a specific class of data from a pre-trained classification network. This intentional removal is crafted to degrade the model's performance specifically concerning the unlearned data class while concurrently minimizing any detrimental impacts on the model's performance in other classes. To achieve this goal, we frame the class unlearning problem from a Bayesian perspective, which yields a loss function that minimizes the log-likelihood associated with the unlearned data with a stability regularization in parameter space. This stability regularization incorporates Mohalanobis distance with respect to the Fisher Information matrix and $l_2$ distance from the pre-trained model parameters. Our novel approach, termed \textbf{Partially-Blinded Unlearning (PBU)}, surpasses existing state-of-the-art class unlearning methods, demonstrating superior effectiveness. Notably, PBU achieves this efficacy without requiring awareness of the entire training dataset but only to the unlearned data points, marking a distinctive feature of its performance. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.08261 [pdf, other]

CoroNetGAN: Controlled Pruning of GANs via Hypernetworks

Authors: Aman Kumar, Khushboo Anand, Shubham Mandloi, Ashutosh Mishra, Avinash Thakur, Neeraj Kasera, Prathosh A P

Abstract: Generative Adversarial Networks (GANs) have proven to exhibit remarkable performance and are widely used across many generative computer vision applications. However, the unprecedented demand for the deployment of GANs on resource-constrained edge devices still poses a challenge due to huge number of parameters involved in the generation process. This has led to focused attention on the area of co… ▽ More Generative Adversarial Networks (GANs) have proven to exhibit remarkable performance and are widely used across many generative computer vision applications. However, the unprecedented demand for the deployment of GANs on resource-constrained edge devices still poses a challenge due to huge number of parameters involved in the generation process. This has led to focused attention on the area of compressing GANs. Most of the existing works use knowledge distillation with the overhead of teacher dependency. Moreover, there is no ability to control the degree of compression in these methods. Hence, we propose CoroNet-GAN for compressing GAN using the combined strength of differentiable pruning method via hypernetworks. The proposed method provides the advantage of performing controllable compression while training along with reducing training time by a substantial factor. Experiments have been done on various conditional GAN architectures (Pix2Pix and CycleGAN) to signify the effectiveness of our approach on multiple benchmark datasets such as Edges-to-Shoes, Horse-to-Zebra and Summer-to-Winter. The results obtained illustrate that our approach succeeds to outperform the baselines on Zebra-to-Horse and Summer-to-Winter achieving the best FID score of 32.3 and 72.3 respectively, yielding high-fidelity images across all the datasets. Additionally, our approach also outperforms the state-of-the-art methods in achieving better inference time on various smart-phone chipsets and data-types making it a feasible solution for deployment on edge devices. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.06797 [pdf]

Leveraging Internal Representations of Model for Magnetic Image Classification

Authors: Adarsh N L, Arun P V, Alok Porwal, Malcolm Aranha

Abstract: Data generated by edge devices has the potential to train intelligent autonomous systems across various domains. Despite the emergence of diverse machine learning approaches addressing privacy concerns and utilizing distributed data, security issues persist due to the sensitive storage of data shards in disparate locations. This paper introduces a potentially groundbreaking paradigm for machine le… ▽ More Data generated by edge devices has the potential to train intelligent autonomous systems across various domains. Despite the emergence of diverse machine learning approaches addressing privacy concerns and utilizing distributed data, security issues persist due to the sensitive storage of data shards in disparate locations. This paper introduces a potentially groundbreaking paradigm for machine learning model training, specifically designed for scenarios with only a single magnetic image and its corresponding label image available. We harness the capabilities of Deep Learning to generate concise yet informative samples, aiming to overcome data scarcity. Through the utilization of deep learning's internal representations, our objective is to efficiently address data scarcity issues and produce meaningful results. This methodology presents a promising avenue for training machine learning models with minimal data. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 5 Pages, 6 Figures

arXiv:2403.06735 [pdf]

Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback

Authors: Adarsh N L, Arun P V, Aravindh N L

Abstract: Research on generative models to produce human-aligned / human-preferred outputs has seen significant recent contributions. Between text and image-generative models, we narrowed our focus to text-based generative models, particularly to produce captions for images that align with human preferences. In this research, we explored a potential method to amplify the performance of the Deep Neural Netwo… ▽ More Research on generative models to produce human-aligned / human-preferred outputs has seen significant recent contributions. Between text and image-generative models, we narrowed our focus to text-based generative models, particularly to produce captions for images that align with human preferences. In this research, we explored a potential method to amplify the performance of the Deep Neural Network Model to generate captions that are preferred by humans. This was achieved by integrating Supervised Learning and Reinforcement Learning with Human Feedback (RLHF) using the Flickr8k dataset. Also, a novel loss function that is capable of optimizing the model based on human feedback is introduced. In this paper, we provide a concise sketch of our approach and results, hoping to contribute to the ongoing advances in the field of human-aligned generative AI models. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 6 Pages, 8 figures

arXiv:2402.19450 [pdf, other]

Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

Authors: Saurabh Srivastava, Annarose M B, Anto P V, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas

Abstract: We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with… ▽ More We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with functionalization of other benchmarks to follow. When evaluating current state-of-the-art models over snapshots of MATH(), we find a reasoning gap -- the percentage difference between the static and functional accuracies. We find reasoning gaps from 58.35% to 80.31% among the state-of-the-art closed and open weights models that perform well on static benchmarks, with the caveat that the gaps are likely to be smaller with more sophisticated prompting strategies. Here we show that models which anecdotally have good reasoning performance over real-world tasks, have quantifiable lower gaps, motivating the open problem of building "gap 0" models. Code for evaluation and new evaluation datasets, three MATH() snapshots, are publicly available at https://github.com/consequentai/fneval/. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 37 pages, 10 figures

arXiv:2311.17960 [pdf, other]

Guided Prompting in SAM for Weakly Supervised Cell Segmentation in Histopathological Images

Authors: Aayush Kumar Tyagi, Vaibhav Mishra, Prathosh A. P., Mausam

Abstract: Cell segmentation in histopathological images plays a crucial role in understanding, diagnosing, and treating many diseases. However, data annotation for this is expensive since there can be a large number of cells per image, and expert pathologists are needed for labelling images. Instead, our paper focuses on using weak supervision -- annotation from related tasks -- to induce a segmenter. Recen… ▽ More Cell segmentation in histopathological images plays a crucial role in understanding, diagnosing, and treating many diseases. However, data annotation for this is expensive since there can be a large number of cells per image, and expert pathologists are needed for labelling images. Instead, our paper focuses on using weak supervision -- annotation from related tasks -- to induce a segmenter. Recent foundation models, such as Segment Anything (SAM), can use prompts to leverage additional supervision during inference. SAM has performed remarkably well in natural image segmentation tasks; however, its applicability to cell segmentation has not been explored. In response, we investigate guiding the prompting procedure in SAM for weakly supervised cell segmentation when only bounding box supervision is available. We develop two workflows: (1) an object detector's output as a test-time prompt to SAM (D-SAM), and (2) SAM as pseudo mask generator over training data to train a standalone segmentation model (SAM-S). On finding that both workflows have some complementary strengths, we develop an integer programming-based approach to reconcile the two sets of segmentation masks, achieving yet higher performance. We experiment on three publicly available cell segmentation datasets namely, ConSep, MoNuSeg, and TNBC, and find that all SAM-based solutions hugely outperform existing weakly supervised image segmentation models, obtaining 9-15 pt Dice gains. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2310.18263 [pdf, other]

MalFake: A Multimodal Fake News Identification for Malayalam using Recurrent Neural Networks and VGG-16

Authors: Adhish S. Sujan, Ajitha. V, Aleena Benny, Amiya M. P., V. S. Anoop

Abstract: The amount of news being consumed online has substantially expanded in recent years. Fake news has become increasingly common, especially in regional languages like Malayalam, due to the rapid publication and lack of editorial standards on some online sites. Fake news may have a terrible effect on society, causing people to make bad judgments, lose faith in authorities, and even engage in violent… ▽ More The amount of news being consumed online has substantially expanded in recent years. Fake news has become increasingly common, especially in regional languages like Malayalam, due to the rapid publication and lack of editorial standards on some online sites. Fake news may have a terrible effect on society, causing people to make bad judgments, lose faith in authorities, and even engage in violent behavior. When we take into the context of India, there are many regional languages, and fake news is spreading in every language. Therefore, providing efficient techniques for identifying false information in regional tongues is crucial. Until now, little to no work has been done in Malayalam, extracting features from multiple modalities to classify fake news. Multimodal approaches are more accurate in detecting fake news, as features from multiple modalities are extracted to build the deep learning classification model. As far as we know, this is the first piece of work in Malayalam that uses multimodal deep learning to tackle false information. Models trained with more than one modality typically outperform models taught with only one modality. Our study in the Malayalam language utilizing multimodal deep learning is a significant step toward more effective misinformation detection and mitigation. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.02094 [pdf, other]

CoNO: Complex Neural Operator for Continuous Dynamical Systems

Authors: Karn Tiwari, N M Anoop Krishnan, Prathosh A P

Abstract: Neural operators extend data-driven models to map between infinite-dimensional functional spaces. These models have successfully solved continuous dynamical systems represented by differential equations, viz weather forecasting, fluid flow, or solid mechanics. However, the existing operators still rely on real space, thereby losing rich representations potentially captured in the complex space by… ▽ More Neural operators extend data-driven models to map between infinite-dimensional functional spaces. These models have successfully solved continuous dynamical systems represented by differential equations, viz weather forecasting, fluid flow, or solid mechanics. However, the existing operators still rely on real space, thereby losing rich representations potentially captured in the complex space by functional transforms. In this paper, we introduce a Complex Neural Operator (CoNO), that parameterizes the integral kernel in the complex fractional Fourier domain. Additionally, the model employing a complex-valued neural network along with aliasing-free activation functions preserves the complex values and complex algebraic properties, thereby enabling improved representation, robustness to noise, and generalization. We show that the model effectively captures the underlying partial differential equation with a single complex fractional Fourier transform. We perform an extensive empirical evaluation of CoNO on several datasets and additional tasks such as zero-shot super-resolution, evaluation of out-of-distribution data, data efficiency, and robustness to noise. CoNO exhibits comparable or superior performance to all the state-of-the-art models in these tasks. Altogether, CoNO presents a robust and superior model for modeling continuous dynamical systems, providing a fillip to scientific machine learning. △ Less

Submitted 4 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2310.01650 [pdf, other]

CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems

Authors: Priyanshu Burark, Karn Tiwari, Meer Mehran Rashid, Prathosh A P, N M Anoop Krishnan

Abstract: Continuous dynamical systems, characterized by differential equations, are ubiquitously used to model several important problems: plasma dynamics, flow through porous media, weather forecasting, and epidemic dynamics. Recently, a wide range of data-driven models has been used successfully to model these systems. However, in contrast to established fields like computer vision, limited studies are a… ▽ More Continuous dynamical systems, characterized by differential equations, are ubiquitously used to model several important problems: plasma dynamics, flow through porous media, weather forecasting, and epidemic dynamics. Recently, a wide range of data-driven models has been used successfully to model these systems. However, in contrast to established fields like computer vision, limited studies are available analyzing the strengths and potential applications of different classes of these models that could steer decision-making in scientific machine learning. Here, we introduce CodBench, an exhaustive benchmarking suite comprising 11 state-of-the-art data-driven models for solving differential equations. Specifically, we comprehensively evaluate 4 distinct categories of models, viz., feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures against 8 widely applicable benchmark datasets encompassing challenges from fluid and solid mechanics. We conduct extensive experiments, assessing the operators' capabilities in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency. Interestingly, our findings highlight that current operators struggle with the newer mechanics datasets, motivating the need for more robust neural operators. All the datasets and codes will be shared in an easy-to-use fashion for the scientific community. We hope this resource will be an impetus for accelerated progress and exploration in modeling dynamical systems. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.14054 [pdf, other]

Adapt then Unlearn: Exploring Parameter Space Semantics for Unlearning in Generative Adversarial Networks

Authors: Piyush Tiwary, Atri Guha, Subhodip Panda, Prathosh A. P

Abstract: Owing to the growing concerns about privacy and regulatory compliance, it is desirable to regulate the output of generative models. To that end, the objective of this work is to prevent the generation of outputs containing undesired features from a pre-trained Generative Adversarial Network (GAN) where the underlying training data set is inaccessible. Our approach is inspired by the observation th… ▽ More Owing to the growing concerns about privacy and regulatory compliance, it is desirable to regulate the output of generative models. To that end, the objective of this work is to prevent the generation of outputs containing undesired features from a pre-trained Generative Adversarial Network (GAN) where the underlying training data set is inaccessible. Our approach is inspired by the observation that the parameter space of GANs exhibits meaningful directions that can be leveraged to suppress specific undesired features. However, such directions usually result in the degradation of the quality of generated samples. Our proposed two-stage method, known as 'Adapt-then-Unlearn,' excels at unlearning such undesirable features while also maintaining the quality of generated samples. In the initial stage, we adapt a pre-trained GAN on a set of negative samples (containing undesired features) provided by the user. Subsequently, we train the original pre-trained GAN using positive samples, along with a repulsion regularizer. This regularizer encourages the learned model parameters to move away from the parameters of the adapted model (first stage) while not degrading the generation quality. We provide theoretical insights into the proposed method. To the best of our knowledge, our approach stands as the first method addressing unlearning within the realm of high-fidelity GANs (such as StyleGAN). We validate the effectiveness of our method through comprehensive experiments, encompassing both class-level unlearning on the MNIST and AFHQ dataset and feature-level unlearning tasks on the CelebA-HQ dataset. Our code and implementation is available at: https://github.com/atriguha/Adapt_Unlearn. △ Less

Submitted 12 February, 2025; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted at Transactions on Machine Learning Research (TMLR)

arXiv:2309.09328 [pdf, other]

doi 10.2991/978-94-6463-314-6_27

Enhancing Knee Osteoarthritis severity level classification using diffusion augmented images

Authors: Paleti Nikhil Chowdary, Gorantla V N S L Vishnu Vardhan, Menta Sai Akshay, Menta Sai Aashish, Vadlapudi Sai Aravind, Garapati Venkata Krishna Rayalu, Aswathy P

Abstract: This research paper explores the classification of knee osteoarthritis (OA) severity levels using advanced computer vision models and augmentation techniques. The study investigates the effectiveness of data preprocessing, including Contrast-Limited Adaptive Histogram Equalization (CLAHE), and data augmentation using diffusion models. Three experiments were conducted: training models on the origin… ▽ More This research paper explores the classification of knee osteoarthritis (OA) severity levels using advanced computer vision models and augmentation techniques. The study investigates the effectiveness of data preprocessing, including Contrast-Limited Adaptive Histogram Equalization (CLAHE), and data augmentation using diffusion models. Three experiments were conducted: training models on the original dataset, training models on the preprocessed dataset, and training models on the augmented dataset. The results show that data preprocessing and augmentation significantly improve the accuracy of the models. The EfficientNetB3 model achieved the highest accuracy of 84\% on the augmented dataset. Additionally, attention visualization techniques, such as Grad-CAM, are utilized to provide detailed attention maps, enhancing the understanding and trustworthiness of the models. These findings highlight the potential of combining advanced models with augmented data and attention visualization for accurate knee OA severity classification. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: Paper has been accepted to be presented at ICACECS 2023 and the final version will be published by Atlantis Highlights in Computer Science (AHCS) , Atlantis Press(part of Springer Nature)

Journal ref: Proceedings of the International e-Conference on Advances in Computer Engineering and Communication Systems (ICACECS 2023) 266-274 (2023)

arXiv:2309.05352 [pdf, other]

Neural Discovery of Permutation Subgroups

Authors: Pavan Karjol, Rohan Kashyap, Prathosh A P

Abstract: We consider the problem of discovering subgroup $H$ of permutation group $S_{n}$. Unlike the traditional $H$-invariant networks wherein $H$ is assumed to be known, we present a method to discover the underlying subgroup, given that it satisfies certain conditions. Our results show that one could discover any subgroup of type $S_{k} (k \leq n)$ by learning an $S_{n}$-invariant function and a linear… ▽ More We consider the problem of discovering subgroup $H$ of permutation group $S_{n}$. Unlike the traditional $H$-invariant networks wherein $H$ is assumed to be known, we present a method to discover the underlying subgroup, given that it satisfies certain conditions. Our results show that one could discover any subgroup of type $S_{k} (k \leq n)$ by learning an $S_{n}$-invariant function and a linear transformation. We also prove similar results for cyclic and dihedral subgroups. Finally, we provide a general theorem that can be extended to discover other subgroups of $S_{n}$. We also demonstrate the applicability of our results through numerical experiments on image-digit sum and symmetric polynomial regression tasks. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Journal ref: In International Conference on Artificial Intelligence and Statistics, pp. 4668-4678. Volume 206. PMLR, 2023

arXiv:2309.02898 [pdf, other]

A Unified Framework for Discovering Discrete Symmetries

Authors: Pavan Karjol, Rohan Kashyap, Aditya Gopalan, Prathosh A. P

Abstract: We consider the problem of learning a function respecting a symmetry from among a class of symmetries. We develop a unified framework that enables symmetry discovery across a broad range of subgroups including locally symmetric, dihedral and cyclic subgroups. At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions inv… ▽ More We consider the problem of learning a function respecting a symmetry from among a class of symmetries. We develop a unified framework that enables symmetry discovery across a broad range of subgroups including locally symmetric, dihedral and cyclic subgroups. At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions invariant to these subgroups in a principled manner. The structure of the architecture enables us to leverage multi-armed bandit algorithms and gradient descent to efficiently optimize over the linear and the non-linear functions, respectively, and to infer the symmetry that is ultimately learnt. We also discuss the necessity of the matrix-valued functions in the architecture. Experiments on image-digit sum and polynomial regression tasks demonstrate the effectiveness of our approach. △ Less

Submitted 27 October, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.01487 [pdf, other]

doi 10.1109/TMI.2024.3453492

GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation

Authors: Vishnuvardhan Purma, Suhas Srinath, Seshan Srirangarajan, Aanchal Kakkar, Prathosh A. P

Abstract: Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bot… ▽ More Histopathological image segmentation is a laborious and time-intensive task, often requiring analysis from experienced pathologists for accurate examinations. To reduce this burden, supervised machine-learning approaches have been adopted using large-scale annotated datasets for histopathological image analysis. However, in several scenarios, the availability of large-scale annotated data is a bottleneck while training such models. Self-supervised learning (SSL) is an alternative paradigm that provides some respite by constructing models utilizing only the unannotated data which is often abundant. The basic idea of SSL is to train a network to perform one or many pseudo or pretext tasks on unannotated data and use it subsequently as the basis for a variety of downstream tasks. It is seen that the success of SSL depends critically on the considered pretext task. While there have been many efforts in designing pretext tasks for classification problems, there haven't been many attempts on SSL for histopathological segmentation. Motivated by this, we propose an SSL approach for segmenting histopathological images via generative diffusion models in this paper. Our method is based on the observation that diffusion models effectively solve an image-to-image translation task akin to a segmentation task. Hence, we propose generative diffusion as the pretext task for histopathological image segmentation. We also propose a multi-loss function-based fine-tuning for the downstream task. We validate our method using several metrics on two publically available datasets along with a newly proposed head and neck (HN) cancer dataset containing hematoxylin and eosin (H\&E) stained images along with annotations. Codes will be made public at https://github.com/suhas-srinath/GenSelfDiff-HIS. △ Less

Submitted 10 September, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE Transactions on Medical Imaging, 2024

arXiv:2308.04469 [pdf]

Correlating Medi-Claim Service by Deep Learning Neural Networks

Authors: Jayanthi Vajiram, Negha Senthil, Nean Adhith. P

Abstract: Medical insurance claims are of organized crimes related to patients, physicians, diagnostic centers, and insurance providers, forming a chain reaction that must be monitored constantly. These kinds of frauds affect the financial growth of both insured people and health insurance companies. The Convolution Neural Network architecture is used to detect fraudulent claims through a correlation study… ▽ More Medical insurance claims are of organized crimes related to patients, physicians, diagnostic centers, and insurance providers, forming a chain reaction that must be monitored constantly. These kinds of frauds affect the financial growth of both insured people and health insurance companies. The Convolution Neural Network architecture is used to detect fraudulent claims through a correlation study of regression models, which helps to detect money laundering on different claims given by different providers. Supervised and unsupervised classifiers are used to detect fraud and non-fraud claims. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2303.11278 [pdf, other]

Bayesian Pseudo-Coresets via Contrastive Divergence

Authors: Piyush Tiwary, Kumar Shubham, Vivek V. Kashyap, Prathosh A. P

Abstract: Bayesian methods provide an elegant framework for estimating parameter posteriors and quantification of uncertainty associated with probabilistic models. However, they often suffer from slow inference times. To address this challenge, Bayesian Pseudo-Coresets (BPC) have emerged as a promising solution. BPC methods aim to create a small synthetic dataset, known as pseudo-coresets, that approximates… ▽ More Bayesian methods provide an elegant framework for estimating parameter posteriors and quantification of uncertainty associated with probabilistic models. However, they often suffer from slow inference times. To address this challenge, Bayesian Pseudo-Coresets (BPC) have emerged as a promising solution. BPC methods aim to create a small synthetic dataset, known as pseudo-coresets, that approximates the posterior inference achieved with the original dataset. This approximation is achieved by optimizing a divergence measure between the true posterior and the pseudo-coreset posterior. Various divergence measures have been proposed for constructing pseudo-coresets, with forward Kullback-Leibler (KL) divergence being the most successful. However, using forward KL divergence necessitates sampling from the pseudo-coreset posterior, often accomplished through approximate Gaussian variational distributions. Alternatively, one could employ Markov Chain Monte Carlo (MCMC) methods for sampling, but this becomes challenging in high-dimensional parameter spaces due to slow mixing. In this study, we introduce a novel approach for constructing pseudo-coresets by utilizing contrastive divergence. Importantly, optimizing contrastive divergence eliminates the need for approximations in the pseudo-coreset construction process. Furthermore, it enables the use of finite-step MCMC methods, alleviating the requirement for extensive mixing to reach a stationary distribution. To validate our method's effectiveness, we conduct extensive experiments on multiple datasets, demonstrating its superiority over existing BPC techniques. △ Less

Submitted 8 May, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

Comments: Accepted at UAI 2024

arXiv:2211.15920 [pdf, other]

Discrete Control in Real-World Driving Environments using Deep Reinforcement Learning

Authors: Avinash Amballa, Advaith P., Pradip Sasmal, Sumohana Channappayya

Abstract: Training self-driving cars is often challenging since they require a vast amount of labeled data in multiple real-world contexts, which is computationally and memory intensive. Researchers often resort to driving simulators to train the agent and transfer the knowledge to a real-world setting. Since simulators lack realistic behavior, these methods are quite inefficient. To address this issue, we… ▽ More Training self-driving cars is often challenging since they require a vast amount of labeled data in multiple real-world contexts, which is computationally and memory intensive. Researchers often resort to driving simulators to train the agent and transfer the knowledge to a real-world setting. Since simulators lack realistic behavior, these methods are quite inefficient. To address this issue, we introduce a framework (perception, planning, and control) in a real-world driving environment that transfers the real-world environments into gaming environments by setting up a reliable Markov Decision Process (MDP). We propose variations of existing Reinforcement Learning (RL) algorithms in a multi-agent setting to learn and execute the discrete control in real-world environments. Experiments show that the multi-agent setting outperforms the single-agent setting in all the scenarios. We also propose reliable initialization, data augmentation, and training techniques that enable the agents to learn and generalize to navigate in a real-world environment with minimal input video data, and with minimal training. Additionally, to show the efficacy of our proposed algorithm, we deploy our method in the virtual driving environment TORCS. △ Less

Submitted 30 November, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: 14 pages

arXiv:2211.06313 [pdf]

BioJam Camp: toward justice through bioengineering and biodesign co-learning with youth

Authors: Callie Chappell, Henry A. -A., Elvia B. O., Emily B., Bailey B., Jacqueline C. -M., Caroline Daws, Cristian F., Emiliano G., Page Goddard, Xavier G., Anne Hu, Gabriela J., Kelley Langhans, Briana Martin-Villa, Penny M. -S., Jennifer M., Soyang N., Melissa Ortiz, Aryana P., Trisha S, Corinne Takara, Emily T., Paloma Vazquez, Rolando Perez , et al. (1 additional authors not shown)

Abstract: BioJam is a political, artistic, and educational project in which Bay Area artists, scientists, and educators collaborate with youth and communities of color to address historical exclusion of their communities in STEM fields and reframe what science can be. As an intergenerational collective, we co-learn on topics of culture (social and biological), community (cultural and ecological), and creati… ▽ More BioJam is a political, artistic, and educational project in which Bay Area artists, scientists, and educators collaborate with youth and communities of color to address historical exclusion of their communities in STEM fields and reframe what science can be. As an intergenerational collective, we co-learn on topics of culture (social and biological), community (cultural and ecological), and creativity. We reject the notion that increasing the number of scientists of color requires inculcation in the ways of the dominant culture. Instead, we center cultural practices, traditional ways of knowing, storytelling, art, experiential learning, and community engagement to break down the framing that positions these practices as distinct from science. The goal of this work is to realize a future in which the practice of science is relatable, accessible, and liberatory. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.17185 [pdf, other]

SurfMyoAiR: A surface Electromyography based framework for Airwriting Recognition

Authors: Ayush Tripathi, Lalan Kumar, Prathosh A. P., Suriya Prakash Muthukrishnan

Abstract: Airwriting Recognition is the task of identifying letters written in free space with finger movement. Electromyography (EMG) is a technique used to record electrical activity during muscle contraction and relaxation as a result of movement and is widely used for gesture recognition. Most of the current research in gesture recognition is focused on identifying static gestures. However, dynamic gest… ▽ More Airwriting Recognition is the task of identifying letters written in free space with finger movement. Electromyography (EMG) is a technique used to record electrical activity during muscle contraction and relaxation as a result of movement and is widely used for gesture recognition. Most of the current research in gesture recognition is focused on identifying static gestures. However, dynamic gestures are natural and user-friendly for being used as alternate input methods in Human-Computer Interaction applications. Airwriting recognition using EMG signals recorded from forearm muscles is therefore a viable solution. Since the user does not need to learn any new gestures and a large range of words can be formed by concatenating these letters, it is generalizable to a wider population. There has been limited work in recognition of airwriting using EMG signals and forms the core idea of the current work. The SurfMyoAiR dataset comprising of EMG signals recorded during writing English uppercase alphabets is constructed. Several different time-domain features to construct EMG envelope and two different time-frequency image representations: Short-Time Fourier Transform and Continuous Wavelet Transform were explored to form the input to a deep learning model for airwriting recognition. Several different deep learning architectures were exploited for this task. Additionally, the effect of various parameters such as signal length, window length and interpolation techniques on the recognition performance is comprehensively explored. The best-achieved accuracy was 78.50% and 62.19% in user-dependent and independent scenarios respectively by using Short-Time Fourier Transform in conjunction with a 2D Convolutional Neural Network based classifier. Airwriting has great potential as a user-friendly modality to be used as an alternate input method in Human-Computer Interaction applications. △ Less

Submitted 31 October, 2022; originally announced October 2022.

arXiv:2206.07484 [pdf, ps, other]

Intelligent analysis of EEG signals to assess consumer decisions: A Study on Neuromarketing

Authors: Nikunj Phutela, Abhilash P, Kaushik Sreevathsan, B N Krupa

Abstract: Neuromarketing is an emerging field that combines neuroscience and marketing to understand the factors that influence consumer decisions better. The study proposes a method to understand consumers' positive and negative reactions to advertisements (ads) and products by analysing electroencephalogram (EEG) signals. These signals are recorded using a low-cost single electrode headset from volunteers… ▽ More Neuromarketing is an emerging field that combines neuroscience and marketing to understand the factors that influence consumer decisions better. The study proposes a method to understand consumers' positive and negative reactions to advertisements (ads) and products by analysing electroencephalogram (EEG) signals. These signals are recorded using a low-cost single electrode headset from volunteers belonging to the ages 18-22. A detailed subject dependent (SD) and subject independent (SI) analysis was performed employing machine learning methods like Naive Bayes (NB), Support Vector Machine (SVM), k-nearest neighbour and Decision Tree and the proposed deep learning (DL) model. SVM and NB yielded an accuracy (Acc.) of 0.63 for the SD analysis. In SI analysis, SVM performed better for the advertisement, product and gender-based analysis. Furthermore, the performance of the DL model was on par with that of SVM, especially, in product and ads-based analysis. △ Less

Submitted 29 May, 2022; originally announced June 2022.

Comments: 7 pages, 6 figures

arXiv:2205.01904 [pdf, other]

doi 10.1109/LSENS.2022.3206307

ImAiR: Airwriting Recognition framework using Image Representation of IMU Signals

Authors: Ayush Tripathi, Arnab Kumar Mondal, Lalan Kumar, Prathosh A. P

Abstract: The problem of Airwriting Recognition is focused on identifying letters written by movement of finger in free space. It is a type of gesture recognition where the dictionary corresponds to letters in a specific language. In particular, airwriting recognition using sensor data from wrist-worn devices can be used as a medium of user input for applications in Human-Computer Interaction (HCI). Recogni… ▽ More The problem of Airwriting Recognition is focused on identifying letters written by movement of finger in free space. It is a type of gesture recognition where the dictionary corresponds to letters in a specific language. In particular, airwriting recognition using sensor data from wrist-worn devices can be used as a medium of user input for applications in Human-Computer Interaction (HCI). Recognition of in-air trajectories using such wrist-worn devices is limited in literature and forms the basis of the current work. In this paper, we propose an airwriting recognition framework by first encoding the time-series data obtained from a wearable Inertial Measurement Unit (IMU) on the wrist as images and then utilizing deep learning-based models for identifying the written alphabets. The signals recorded from 3-axis accelerometer and gyroscope in IMU are encoded as images using different techniques such as Self Similarity Matrix (SSM), Gramian Angular Field (GAF) and Markov Transition Field (MTF) to form two sets of 3-channel images. These are then fed to two separate classification models and letter prediction is made based on an average of the class conditional probabilities obtained from the two models. Several standard model architectures for image classification such as variants of ResNet, DenseNet, VGGNet, AlexNet and GoogleNet have been utilized. Experiments performed on two publicly available datasets demonstrate the efficacy of the proposed strategy. The code for our implementation will be made available at https://github.com/ayushayt/ImAiR. △ Less

Submitted 8 September, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

arXiv:2112.14983 [pdf]

Exploring the pattern of Emotion in children with ASD as an early biomarker through Recurring-Convolution Neural Network (R-CNN)

Authors: Abirami S P, Kousalya G, Karthick R

Abstract: Autism Spectrum Disorder (ASD) is found to be a major concern among various occupational therapists. The foremost challenge of this neurodevelopmental disorder lies in the fact of analyzing and exploring various symptoms of the children at their early stage of development. Such early identification could prop up the therapists and clinicians to provide proper assistive support to make the children… ▽ More Autism Spectrum Disorder (ASD) is found to be a major concern among various occupational therapists. The foremost challenge of this neurodevelopmental disorder lies in the fact of analyzing and exploring various symptoms of the children at their early stage of development. Such early identification could prop up the therapists and clinicians to provide proper assistive support to make the children lead an independent life. Facial expressions and emotions perceived by the children could contribute to such early intervention of autism. In this regard, the paper implements in identifying basic facial expression and exploring their emotions upon a time variant factor. The emotions are analyzed by incorporating the facial expression identified through CNN using 68 landmark points plotted on the frontal face with a prediction network formed by RNN known as RCNN-FER system. The paper adopts R-CNN to take the advantage of increased accuracy and performance with decreased time complexity in predicting emotion as a textual network analysis. The papers proves better accuracy in identifying the emotion in autistic children when compared over simple machine learning models built for such identifications contributing to autistic society. △ Less

Submitted 30 December, 2021; originally announced December 2021.

Comments: 8 figures and 2 tables. totally 18 pages

arXiv:2111.12938 [pdf, other]

doi 10.1109/LSENS.2021.3139473

SCLAiR : Supervised Contrastive Learning for User and Device Independent Airwriting Recognition

Authors: Ayush Tripathi, Arnab Kumar Mondal, Lalan Kumar, Prathosh A. P

Abstract: Airwriting Recognition is the problem of identifying letters written in free space with finger movement. It is essentially a specialized case of gesture recognition, wherein the vocabulary of gestures corresponds to letters as in a particular language. With the wide adoption of smart wearables in the general population, airwriting recognition using motion sensors from a smart-band can be used as a… ▽ More Airwriting Recognition is the problem of identifying letters written in free space with finger movement. It is essentially a specialized case of gesture recognition, wherein the vocabulary of gestures corresponds to letters as in a particular language. With the wide adoption of smart wearables in the general population, airwriting recognition using motion sensors from a smart-band can be used as a medium of user input for applications in Human-Computer Interaction. There has been limited work in the recognition of in-air trajectories using motion sensors, and the performance of the techniques in the case when the device used to record signals is changed has not been explored hitherto. Motivated by these, a new paradigm for device and user-independent airwriting recognition based on supervised contrastive learning is proposed. A two stage classification strategy is employed, the first of which involves training an encoder network with supervised contrastive loss. In the subsequent stage, a classification head is trained with the encoder weights kept frozen. The efficacy of the proposed method is demonstrated through experiments on a publicly available dataset and also with a dataset recorded in our lab using a different device. Experiments have been performed in both supervised and unsupervised settings and compared against several state-of-the-art domain adaptation techniques. Data and the code for our implementation will be made available at https://github.com/ayushayt/SCLAiR. △ Less

Submitted 29 December, 2021; v1 submitted 25 November, 2021; originally announced November 2021.

arXiv:2110.12532 [pdf, ps, other]

doi 10.1109/WCNC51071.2022.9771783

Fronthaul Compression for Uplink Massive MIMO using Matrix Decomposition

Authors: Aswathylakshmi P, Radha Krishna Ganti

Abstract: Massive MIMO opens up attractive possibilities for next generation wireless systems with its large number of antennas offering spatial diversity and multiplexing gain. However, the fronthaul link that connects a massive MIMO Remote Radio Head (RRH) and carries IQ samples to the Baseband Unit (BBU) of the base station can throttle the network capacity/speed if appropriate data compression technique… ▽ More Massive MIMO opens up attractive possibilities for next generation wireless systems with its large number of antennas offering spatial diversity and multiplexing gain. However, the fronthaul link that connects a massive MIMO Remote Radio Head (RRH) and carries IQ samples to the Baseband Unit (BBU) of the base station can throttle the network capacity/speed if appropriate data compression techniques are not applied. In this paper, we propose an iterative technique for fronthaul load reduction in the uplink for massive MIMO systems that utilizes the convolution structure of the received signals. We use an alternating minimisation algorithm for blind deconvolution of the received data matrix that provides compression ratios of 30-50. In addition, the technique presented here can be used for blind decoding of OFDM signals in massive MIMO systems. △ Less

Submitted 24 October, 2021; originally announced October 2021.

Comments: 7 pages, 3 figures

Journal ref: Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), 2524-2529

arXiv:2109.05494 [pdf, other]

Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages

Authors: Anoop C S, Prathosh A P, A G Ramakrishnan

Abstract: Building an automatic speech recognition (ASR) system from scratch requires a large amount of annotated speech data, which is difficult to collect in many languages. However, there are cases where the low-resource language shares a common acoustic space with a high-resource language having enough annotated data to build an ASR. In such cases, we show that the domain-independent acoustic models lea… ▽ More Building an automatic speech recognition (ASR) system from scratch requires a large amount of annotated speech data, which is difficult to collect in many languages. However, there are cases where the low-resource language shares a common acoustic space with a high-resource language having enough annotated data to build an ASR. In such cases, we show that the domain-independent acoustic models learned from the high-resource language through unsupervised domain adaptation (UDA) schemes can enhance the performance of the ASR in the low-resource language. We use the specific example of Hindi in the source domain and Sanskrit in the target domain. We explore two architectures: i) domain adversarial training using gradient reversal layer (GRL) and ii) domain separation networks (DSN). The GRL and DSN architectures give absolute improvements of 6.71% and 7.32%, respectively, in word error rate over the baseline deep neural network model when trained on just 5.5 hours of data in the target domain. We also show that choosing a proper language (Telugu) in the source domain can bring further improvement. The results suggest that UDA schemes can be helpful in the development of ASR systems for low-resource languages, mitigating the hassle of collecting large amounts of annotated speech data. △ Less

Submitted 16 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

Comments: Submitted to ASRU 2021

arXiv:2102.05602 [pdf, other]

Systematic Generalization in Neural Networks-based Multivariate Time Series Forecasting Models

Authors: Hritik Bansal, Gantavya Bhatt, Pankaj Malhotra, Prathosh A. P

Abstract: Systematic generalization aims to evaluate reasoning about novel combinations from known components, an intrinsic property of human cognition. In this work, we study systematic generalization of NNs in forecasting future time series of dependent variables in a dynamical system, conditioned on past time series of dependent variables, and past and future control variables. We focus on systematic gen… ▽ More Systematic generalization aims to evaluate reasoning about novel combinations from known components, an intrinsic property of human cognition. In this work, we study systematic generalization of NNs in forecasting future time series of dependent variables in a dynamical system, conditioned on past time series of dependent variables, and past and future control variables. We focus on systematic generalization wherein the NN-based forecasting model should perform well on previously unseen combinations or regimes of control variables after being trained on a limited set of the possible regimes. For NNs to depict such out-of-distribution generalization, they should be able to disentangle the various dependencies between control variables and dependent variables. We hypothesize that a modular NN architecture guided by the readily-available knowledge of independence of control variables as a potentially useful inductive bias to this end. Through extensive empirical evaluation on a toy dataset and a simulated electric motor dataset, we show that our proposed modular NN architecture serves as a simple yet highly effective inductive bias that enabling better forecasting of the dependent variables up to large horizons in contrast to standard NNs, and indeed capture the true dependency relations between the dependent and the control variables. △ Less

Submitted 7 March, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: 9 pages, 8 figures, 2 tables

arXiv:2010.12940 [pdf, other]

Neural Compound-Word (Sandhi) Generation and Splitting in Sanskrit Language

Authors: Sushant Dave, Arun Kumar Singh, Prathosh A. P., Brejesh Lall

Abstract: This paper describes neural network based approaches to the process of the formation and splitting of word-compounding, respectively known as the Sandhi and Vichchhed, in Sanskrit language. Sandhi is an important idea essential to morphological analysis of Sanskrit texts. Sandhi leads to word transformations at word boundaries. The rules of Sandhi formation are well defined but complex, sometimes… ▽ More This paper describes neural network based approaches to the process of the formation and splitting of word-compounding, respectively known as the Sandhi and Vichchhed, in Sanskrit language. Sandhi is an important idea essential to morphological analysis of Sanskrit texts. Sandhi leads to word transformations at word boundaries. The rules of Sandhi formation are well defined but complex, sometimes optional and in some cases, require knowledge about the nature of the words being compounded. Sandhi split or Vichchhed is an even more difficult task given its non uniqueness and context dependence. In this work, we propose the route of formulating the problem as a sequence to sequence prediction task, using modern deep learning techniques. Being the first fully data driven technique, we demonstrate that our model has an accuracy better than the existing methods on multiple standard datasets, despite not using any additional lexical or morphological resources. The code is being made available at https://github.com/IITD-DataScience/Sandhi_Prakarana △ Less

Submitted 24 October, 2020; originally announced October 2020.

Comments: 6 pages, 3 figures, CODS-COMAD 2021, IIIT Bangalore, India

arXiv:2010.12937 [pdf, other]

A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis

Authors: Arun Kumar Singh, Sushant Dave, Prathosh A. P., Brejesh Lall, Shresth Mehta

Abstract: This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix) and inflectional words (padas) formed due to suffixes along with neural network based approaches to process the formation and splitting of inflectional words. Inflectional words spans the primary and secondary derivative nouns as the scope of current work. Pratyayas are an important dimension of morphological analysis of Sans… ▽ More This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix) and inflectional words (padas) formed due to suffixes along with neural network based approaches to process the formation and splitting of inflectional words. Inflectional words spans the primary and secondary derivative nouns as the scope of current work. Pratyayas are an important dimension of morphological analysis of Sanskrit texts. There have been Sanskrit Computational Linguistics tools for processing and analyzing Sanskrit texts. Unfortunately there has not been any work to standardize & validate these tools specifically for derivative nouns analysis. In this work, we prepared a Sanskrit suffix benchmark corpus called Pratyaya-Kosh to evaluate the performance of tools. We also present our own neural approach for derivative nouns analysis while evaluating the same on most prominent Sanskrit Morphological Analysis tools. This benchmark will be freely dedicated and available to researchers worldwide and we hope it will motivate all to improve morphological analysis in Sanskrit Language. △ Less

Submitted 24 October, 2020; originally announced October 2020.

Comments: 6 pages, 2 figures, EACL 2021 Submission

arXiv:2010.12227 [pdf, other]

Geometric Separability using Orthogonal Objects

Authors: Abidha V P, Pradeesha Ashok

Abstract: Given a bichromatic point set $P=\textbf{R} \cup \textbf{B}$ of red and blue points, a separator is an object of a certain type that separates $\textbf{R}$ and $\textbf{B}$. We study the geometric separability problem when the separator is a) rectangular annulus of fixed orientation b) rectangular annulus of arbitrary orientation c) square annulus of fixed orientation d) orthogonal convex polygon.… ▽ More Given a bichromatic point set $P=\textbf{R} \cup \textbf{B}$ of red and blue points, a separator is an object of a certain type that separates $\textbf{R}$ and $\textbf{B}$. We study the geometric separability problem when the separator is a) rectangular annulus of fixed orientation b) rectangular annulus of arbitrary orientation c) square annulus of fixed orientation d) orthogonal convex polygon. In this paper, we give polynomial time algorithms to construct separators of each of the above type that also optimizes a given parameter. △ Less

Submitted 28 January, 2022; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: 9 pages, 3 figures, 1 table

arXiv:2008.09466 [pdf, other]

RespVAD: Voice Activity Detection via Video-Extracted Respiration Patterns

Authors: Arnab Kumar Mondal, Prathosh A. P

Abstract: Voice Activity Detection (VAD) refers to the task of identification of regions of human speech in digital signals such as audio and video. While VAD is a necessary first step in many speech processing systems, it poses challenges when there are high levels of ambient noise during the audio recording. To improve the performance of VAD in such conditions, several methods utilizing the visual informa… ▽ More Voice Activity Detection (VAD) refers to the task of identification of regions of human speech in digital signals such as audio and video. While VAD is a necessary first step in many speech processing systems, it poses challenges when there are high levels of ambient noise during the audio recording. To improve the performance of VAD in such conditions, several methods utilizing the visual information extracted from the region surrounding the mouth/lip region of the speakers' video recording have been proposed. Even though these provide advantages over audio-only methods, they depend on faithful extraction of lip/mouth regions. Motivated by these, a new paradigm for VAD based on the fact that respiration forms the primary source of energy for speech production is proposed. Specifically, an audio-independent VAD technique using the respiration pattern extracted from the speakers' video is developed. The Respiration Pattern is first extracted from the video focusing on the abdominal-thoracic region of a speaker using an optical flow based method. Subsequently, voice activity is detected from the respiration pattern signal using neural sequence-to-sequence prediction models. The efficacy of the proposed method is demonstrated through experiments on a challenging dataset recorded in real acoustic environments and compared with four previous methods based on audio and visual cues. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Comments: Accepted in IEEE Sensor Letters

arXiv:2005.02435 [pdf, other]

doi 10.1109/LSP.2020.2996935

Effect of The Latent Structure on Clustering with GANs

Authors: Deepak Mishra, Aravind Jayendran, Prathosh A. P

Abstract: Generative adversarial networks (GANs) have shown remarkable success in generation of data from natural data manifolds such as images. In several scenarios, it is desirable that generated data is well-clustered, especially when there is severe class imbalance. In this paper, we focus on the problem of clustering in generated space of GANs and uncover its relationship with the characteristics of th… ▽ More Generative adversarial networks (GANs) have shown remarkable success in generation of data from natural data manifolds such as images. In several scenarios, it is desirable that generated data is well-clustered, especially when there is severe class imbalance. In this paper, we focus on the problem of clustering in generated space of GANs and uncover its relationship with the characteristics of the latent space. We derive from first principles, the necessary and sufficient conditions needed to achieve faithful clustering in the GAN framework: (i) presence of a multimodal latent space with adjustable priors, (ii) existence of a latent space inversion mechanism and (iii) imposition of the desired cluster priors on the latent space. We also identify the GAN models in the literature that partially satisfy these conditions and demonstrate the importance of all the components required, through ablative studies on multiple real world image datasets. Additionally, we describe a procedure to construct a multimodal latent space which facilitates learning of cluster priors with sparse supervision. △ Less

Submitted 5 May, 2020; originally announced May 2020.

arXiv:2004.10174 [pdf]

Internet of Things(IoT) Based Multilevel Drunken Driving Detection and Prevention System Using Raspberry Pi 3

Authors: Viswanatha V, Venkata Siva Reddy R, Ashwini Kumari P, Pradeep Kumar S

Abstract: In this paper, the proposed system has demonstrated three ways of detecting alcohol level in the body of the car driver and prevent car driver from driving the vehicle by turning off the ignition system. It also sends messages to concerned people. In order to detect breath alcohol level MQ-3 sensor is included in this module along with a heartbeat sensor which can detect the heart beat rate of dri… ▽ More In this paper, the proposed system has demonstrated three ways of detecting alcohol level in the body of the car driver and prevent car driver from driving the vehicle by turning off the ignition system. It also sends messages to concerned people. In order to detect breath alcohol level MQ-3 sensor is included in this module along with a heartbeat sensor which can detect the heart beat rate of driver, facial recognition using webcam & MATLAB and a Wi-Fi module to send a message through the TCP/IP App, a Raspberry pi module to turn off the ignition and an alarm as prevention module. If a driver alcohol intake is more than the prescribed range, set by government the ignition will be made off provided either his heart beat abnormal or the driver is drowsy. In both the cases there will be a message sent to the App and from the App you can send it to family, friend, and well-wisher or nearest cop for the help. The system is developed considering the fact if driver is drunk and he needs a help, his friend can drive the car if he is not drunk. The safety of both the driver and the surroundings are aimed by this system and this aids in minimizing death cases by drunken driving and also burden on the cops. △ Less

Submitted 21 April, 2020; originally announced April 2020.

arXiv:2003.09293 [pdf, other]

U-Det: A Modified U-Net architecture with bidirectional feature network for lung nodule segmentation

Authors: Nikhil Varma Keetha, Samson Anosh Babu P, Chandra Sekhara Rao Annavarapu

Abstract: Early diagnosis and analysis of lung cancer involve a precise and efficient lung nodule segmentation in computed tomography (CT) images. However, the anonymous shapes, visual features, and surroundings of the nodule in the CT image pose a challenging problem to the robust segmentation of the lung nodules. This article proposes U-Det, a resource-efficient model architecture, which is an end to end… ▽ More Early diagnosis and analysis of lung cancer involve a precise and efficient lung nodule segmentation in computed tomography (CT) images. However, the anonymous shapes, visual features, and surroundings of the nodule in the CT image pose a challenging problem to the robust segmentation of the lung nodules. This article proposes U-Det, a resource-efficient model architecture, which is an end to end deep learning approach to solve the task at hand. It incorporates a Bi-FPN (bidirectional feature network) between the encoder and decoder. Furthermore, it uses Mish activation function and class weights of masks to enhance segmentation efficiency. The proposed model is extensively trained and evaluated on the publicly available LUNA-16 dataset consisting of 1186 lung nodules. The U-Det architecture outperforms the existing U-Net model with the Dice similarity coefficient (DSC) of 82.82% and achieves results comparable to human experts. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Comments: 14 pages, 7 figures, 5 tables

Showing 1–50 of 60 results for author: p, A