Search | arXiv e-print repository

Open World Scene Graph Generation using Vision Language Models

Authors: Amartya Dutta, Kazi Sajeed Mehrab, Medha Sawhney, Abhilash Neog, Mridul Khurana, Sepideh Fatemi, Aanish Pradhan, M. Maruf, Ismini Lourentzou, Arka Daw, Anuj Karpatne

Abstract: Scene-Graph Generation (SGG) seeks to recognize objects in an image and distill their salient pairwise relationships. Most methods depend on dataset-specific supervision to learn the variety of interactions, restricting their usefulness in open-world settings, involving novel objects and/or relations. Even methods that leverage large Vision Language Models (VLMs) typically require benchmark-specif… ▽ More Scene-Graph Generation (SGG) seeks to recognize objects in an image and distill their salient pairwise relationships. Most methods depend on dataset-specific supervision to learn the variety of interactions, restricting their usefulness in open-world settings, involving novel objects and/or relations. Even methods that leverage large Vision Language Models (VLMs) typically require benchmark-specific fine-tuning. We introduce Open-World SGG, a training-free, efficient, model-agnostic framework that taps directly into the pretrained knowledge of VLMs to produce scene graphs with zero additional learning. Casting SGG as a zero-shot structured-reasoning problem, our method combines multimodal prompting, embedding alignment, and a lightweight pair-refinement strategy, enabling inference over unseen object vocabularies and relation sets. To assess this setting, we formalize an Open-World evaluation protocol that measures performance when no SGG-specific data have been observed either in terms of objects and relations. Experiments on Visual Genome, Open Images V6, and the Panoptic Scene Graph (PSG) dataset demonstrate the capacity of pretrained VLMs to perform relational understanding without task-level training. △ Less

Submitted 9 June, 2025; originally announced June 2025.

Comments: Accepted in CVPR 2025 Workshop (CVinW)

arXiv:2506.05629 [pdf, ps, other]

Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs

Authors: Ananth Muppidi, Abhilash Nandy, Sambaran Bandyopadhyay

Abstract: The performance of large language models in domain-specific tasks necessitates fine-tuning, which is computationally expensive and technically challenging. This paper focuses on parameter-efficient fine-tuning using soft prompting, a promising approach that adapts pre-trained models to downstream tasks by learning a small set of parameters. We propose a novel Input Dependent Soft Prompting techniq… ▽ More The performance of large language models in domain-specific tasks necessitates fine-tuning, which is computationally expensive and technically challenging. This paper focuses on parameter-efficient fine-tuning using soft prompting, a promising approach that adapts pre-trained models to downstream tasks by learning a small set of parameters. We propose a novel Input Dependent Soft Prompting technique with a self-Attention Mechanism (ID-SPAM) that generates soft prompts based on the input tokens and attends different tokens with varying importance. Our method is simple and efficient, keeping the number of trainable parameters small. We show the merits of the proposed approach compared to state-of-the-art techniques on various tasks and show the improved zero shot domain transfer capability. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: Accepted in ACL 2025 (Main) Conference

arXiv:2505.06548 [pdf, ps, other]

REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback

Authors: Aniruddha Roy, Pretam Ray, Abhilash Nandy, Somak Aditya, Pawan Goyal

Abstract: Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructio… ▽ More Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for fine-tuning LLMs. Furthermore, we demonstrate that incorporating a Reinforcement Learning (RL) based training algorithm into this LLMs-based framework leads to further enhancements. Our evaluation of the dataset reveals that these RL-based frameworks achieve a substantial improvements in 63-66% of the tasks compared to previous approaches. △ Less

Submitted 10 May, 2025; originally announced May 2025.

Comments: 11 pages

arXiv:2505.00949 [pdf, other]

Llama-Nemotron: Efficient Reasoning Models

Authors: Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani, Ido Shahaf, Oren Tropp, Ehud Karpas, Ran Zilberstein, Jiaqi Zeng, Soumye Singhal, Alexander Bukharin, Yian Zhang, Tugrul Konuk, Gerald Shen, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Yoshi Suhara, Olivier Delalleau, Zijia Chen , et al. (109 additional authors not shown)

Abstract: We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior i… ▽ More We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM. △ Less

Submitted 14 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

arXiv:2504.06422 [pdf, other]

Retuve: Automated Multi-Modality Analysis of Hip Dysplasia with Open Source AI

Authors: Adam McArthur, Stephanie Wichuk, Stephen Burnside, Andrew Kirby, Alexander Scammon, Damian Sol, Abhilash Hareendranathan, Jacob L. Jaremko

Abstract: Developmental dysplasia of the hip (DDH) poses significant diagnostic challenges, hindering timely intervention. Current screening methodologies lack standardization, and AI-driven studies suffer from reproducibility issues due to limited data and code availability. To address these limitations, we introduce Retuve, an open-source framework for multi-modality DDH analysis, encompassing both ultras… ▽ More Developmental dysplasia of the hip (DDH) poses significant diagnostic challenges, hindering timely intervention. Current screening methodologies lack standardization, and AI-driven studies suffer from reproducibility issues due to limited data and code availability. To address these limitations, we introduce Retuve, an open-source framework for multi-modality DDH analysis, encompassing both ultrasound (US) and X-ray imaging. Retuve provides a complete and reproducible workflow, offering open datasets comprising expert-annotated US and X-ray images, pre-trained models with training code and weights, and a user-friendly Python Application Programming Interface (API). The framework integrates segmentation and landmark detection models, enabling automated measurement of key diagnostic parameters such as the alpha angle and acetabular index. By adhering to open-source principles, Retuve promotes transparency, collaboration, and accessibility in DDH research. This initiative has the potential to democratize DDH screening, facilitate early diagnosis, and ultimately improve patient outcomes by enabling widespread screening and early intervention. The GitHub repository/code can be found here: https://github.com/radoss-org/retuve △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: 12 pages, 8 figures, submitted to Software Impacts

arXiv:2504.01879 [pdf, other]

TransientTables: Evaluating LLMs' Reasoning on Temporally Evolving Semi-structured Tables

Authors: Abhilash Shankarampeta, Harsh Mahajan, Tushar Kataria, Dan Roth, Vivek Gupta

Abstract: Humans continuously make new discoveries, and understanding temporal sequence of events leading to these breakthroughs is essential for advancing science and society. This ability to reason over time allows us to identify future steps and understand the effects of financial and political decisions on our lives. However, large language models (LLMs) are typically trained on static datasets, limitin… ▽ More Humans continuously make new discoveries, and understanding temporal sequence of events leading to these breakthroughs is essential for advancing science and society. This ability to reason over time allows us to identify future steps and understand the effects of financial and political decisions on our lives. However, large language models (LLMs) are typically trained on static datasets, limiting their ability to perform effective temporal reasoning. To assess the temporal reasoning capabilities of LLMs, we present the TRANSIENTTABLES dataset, which comprises 3,971 questions derived from over 14,000 tables, spanning 1,238 entities across multiple time periods. We introduce a template-based question-generation pipeline that harnesses LLMs to refine both templates and questions. Additionally, we establish baseline results using state-of-the-art LLMs to create a benchmark. We also introduce novel modeling strategies centered around task decomposition, enhancing LLM performance. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: 19 Pages. 21 Tables, 1 figure

arXiv:2502.19151 [pdf]

Design of Resistive Frequency Selective Surface based Radar Absorbing Structure-A Deep Learning Approach

Authors: Vijay Kumar Sutrakar, Nikhil Morge, Anjana PK, Abhilash PV

Abstract: In this paper, deep learning-based approach for the design of radar absorbing structure using resistive frequency selective surface is proposed. In the present design, reflection coefficient is used as input of deep learning model and the Jerusalem cross based unit cell dimensions is predicted as outcome. Sequential neural network based deep learning model with adaptive moment estimation optimizer… ▽ More In this paper, deep learning-based approach for the design of radar absorbing structure using resistive frequency selective surface is proposed. In the present design, reflection coefficient is used as input of deep learning model and the Jerusalem cross based unit cell dimensions is predicted as outcome. Sequential neural network based deep learning model with adaptive moment estimation optimizer is used for designing multi frequency band absorbers. The model is used for designing radar absorber from L to Ka band depending on unit cell parameters and thickness. The outcome of deep learning model is further compared with full-wave simulation software and an excellent match is obtained. The proposed model can be used for the low-cost design of various radar absorbing structures using a single unit cell and thickness across the band of frequencies. △ Less

Submitted 26 February, 2025; originally announced February 2025.

arXiv:2502.18170 [pdf, ps, other]

Pauli measurements are not optimal for single-copy tomography

Authors: Jayadev Acharya, Abhilash Dharmavarapu, Yuhan Liu, Nengkun Yu

Abstract: Quantum state tomography is a fundamental problem in quantum computing. Given $n$ copies of an unknown $N$-qubit state $ρ\in \mathbb{C}^{d \times d},d=2^N$, the goal is to learn the state up to an accuracy $ε$ in trace distance, with at least probability 0.99. We are interested in the copy complexity, the minimum number of copies of $ρ$ needed to fulfill the task. Pauli measurements have attract… ▽ More Quantum state tomography is a fundamental problem in quantum computing. Given $n$ copies of an unknown $N$-qubit state $ρ\in \mathbb{C}^{d \times d},d=2^N$, the goal is to learn the state up to an accuracy $ε$ in trace distance, with at least probability 0.99. We are interested in the copy complexity, the minimum number of copies of $ρ$ needed to fulfill the task. Pauli measurements have attracted significant attention due to their ease of implementation in limited settings. The best-known upper bound is $O(\frac{N \cdot 12^N}{ε^2})$, and no non-trivial lower bound is known besides the general single-copy lower bound $Ω(\frac{8^n}{ε^2})$, achieved by hard-to-implement structured POVMs such as MUB, SIC-POVM, and uniform POVM. We have made significant progress on this long-standing problem. We first prove a stronger upper bound of $O(\frac{10^N}{ε^2})$. To complement it with a lower bound of $Ω(\frac{9.118^N}{ε^2})$, which holds under adaptivity. To our knowledge, this demonstrates the first known separation between Pauli measurements and structured POVMs. The new lower bound is a consequence of a novel framework for adaptive quantum state tomography with measurement constraints. The main advantage over prior methods is that we can use measurement-dependent hard instances to prove tight lower bounds for Pauli measurements. Moreover, we connect the copy-complexity lower bound to the eigenvalues of the measurement information channel, which governs the measurement's capacity to distinguish states. To demonstrate the generality of the new framework, we obtain tight-bounds for adaptive quantum tomography with $k$-outcome measurements, where we recover existing results and establish new ones. △ Less

Submitted 25 February, 2025; originally announced February 2025.

Comments: Accepted at STOC 2025

ACM Class: E.4; F.2.0; G.3

arXiv:2502.15785 [pdf, other]

Masking the Gaps: An Imputation-Free Approach to Time Series Modeling with Missing Data

Authors: Abhilash Neog, Arka Daw, Sepideh Fatemi Khorasgani, Anuj Karpatne

Abstract: A significant challenge in time-series (TS) modeling is the presence of missing values in real-world TS datasets. Traditional two-stage frameworks, involving imputation followed by modeling, suffer from two key drawbacks: (1) the propagation of imputation errors into subsequent TS modeling, (2) the trade-offs between imputation efficacy and imputation complexity. While one-stage approaches attempt… ▽ More A significant challenge in time-series (TS) modeling is the presence of missing values in real-world TS datasets. Traditional two-stage frameworks, involving imputation followed by modeling, suffer from two key drawbacks: (1) the propagation of imputation errors into subsequent TS modeling, (2) the trade-offs between imputation efficacy and imputation complexity. While one-stage approaches attempt to address these limitations, they often struggle with scalability or fully leveraging partially observed features. To this end, we propose a novel imputation-free approach for handling missing values in time series termed Missing Feature-aware Time Series Modeling (MissTSM) with two main innovations. First, we develop a novel embedding scheme that treats every combination of time-step and feature (or channel) as a distinct token. Second, we introduce a novel Missing Feature-Aware Attention (MFAA) Layer to learn latent representations at every time-step based on partially observed features. We evaluate the effectiveness of MissTSM in handling missing values over multiple benchmark datasets. △ Less

Submitted 17 February, 2025; originally announced February 2025.

Comments: 15 pages

arXiv:2501.09668 [pdf, other]

Model Predictive Path Integral Docking of Fully Actuated Surface Vessel

Authors: Akash Vijayakumar, Atmanand M A, Abhilash Somayajula

Abstract: Autonomous docking remains one of the most challenging maneuvers in marine robotics, requiring precise control and robust perception in confined spaces. This paper presents a novel approach integrating Model Predictive Path Integral(MPPI) control with real-time LiDAR-based dock detection for autonomous surface vessel docking. Our framework uniquely combines probabilistic trajectory optimization wi… ▽ More Autonomous docking remains one of the most challenging maneuvers in marine robotics, requiring precise control and robust perception in confined spaces. This paper presents a novel approach integrating Model Predictive Path Integral(MPPI) control with real-time LiDAR-based dock detection for autonomous surface vessel docking. Our framework uniquely combines probabilistic trajectory optimization with a multiobjective cost function that simultaneously considers docking precision, safety constraints, and motion efficiency. The MPPI controller generates optimal trajectories by intelligently sampling control sequences and evaluating their costs based on dynamic clearance requirements, orientation alignment, and target position objectives. We introduce an adaptive dock detection pipeline that processes LiDAR point clouds to extract critical geometric features, enabling real-time updates of docking parameters. The proposed method is extensively validated in a physics-based simulation environment that incorporates realistic sensor noise, vessel dynamics, and environmental constraints. Results demonstrate successful docking from various initial positions while maintaining safe clearances and smooth motion characteristics. △ Less

Submitted 16 January, 2025; originally announced January 2025.

Comments: 6 pages, 6 figures, 1 table, UT2025 Conference, IEEE International Symposium on Underwater Technology 2025

arXiv:2412.09222 [pdf, other]

Building a Privacy Web with SPIDEr -- Secure Pipeline for Information De-Identification with End-to-End Encryption

Authors: Novoneel Chakraborty, Anshoo Tandon, Kailash Reddy, Kaushal Kirpekar, Bryan Paul Robert, Hari Dilip Kumar, Abhilash Venkatesh, Abhay Sharma

Abstract: Data de-identification makes it possible to glean insights from data while preserving user privacy. The use of Trusted Execution Environments (TEEs) allow for the execution of de-identification applications on the cloud without the need for a user to trust the third-party application provider. In this paper, we present \textit{SPIDEr - Secure Pipeline for Information De-Identification with End-to-… ▽ More Data de-identification makes it possible to glean insights from data while preserving user privacy. The use of Trusted Execution Environments (TEEs) allow for the execution of de-identification applications on the cloud without the need for a user to trust the third-party application provider. In this paper, we present \textit{SPIDEr - Secure Pipeline for Information De-Identification with End-to-End Encryption}, our implementation of an end-to-end encrypted data de-identification pipeline. SPIDEr supports classical anonymisation techniques such as suppression, pseudonymisation, generalisation, and aggregation, as well as techniques that offer a formal privacy guarantee such as k-anonymisation and differential privacy. To enable scalability and improve performance on constrained TEE hardware, we enable batch processing of data for differential privacy computations. We present our design of the control flows for end-to-end secure execution of de-identification operations within a TEE. As part of the control flow for running SPIDEr within the TEE, we perform attestation, a process that verifies that the software binaries were properly instantiated on a known, trusted platform. △ Less

Submitted 12 December, 2024; originally announced December 2024.

Comments: 3 pages, 2 figures

arXiv:2411.07550 [pdf, other]

Learning Autonomous Docking Operation of Fully Actuated Autonomous Surface Vessel from Expert data

Authors: Akash Vijayakumar, Atmanand M A, Abhilash Somayajula

Abstract: This paper presents an approach for autonomous docking of a fully actuated autonomous surface vessel using expert demonstration data. We frame the docking problem as an imitation learning task and employ inverse reinforcement learning (IRL) to learn a reward function from expert trajectories. A two-stage neural network architecture is implemented to incorporate both environmental context from sens… ▽ More This paper presents an approach for autonomous docking of a fully actuated autonomous surface vessel using expert demonstration data. We frame the docking problem as an imitation learning task and employ inverse reinforcement learning (IRL) to learn a reward function from expert trajectories. A two-stage neural network architecture is implemented to incorporate both environmental context from sensors and vehicle kinematics into the reward function. The learned reward is then used with a motion planner to generate docking trajectories. Experiments in simulation demonstrate the effectiveness of this approach in producing human-like docking behaviors across different environmental configurations. △ Less

Submitted 11 November, 2024; originally announced November 2024.

Comments: 5 pages, 8 figures, IEEE Oceans Halifax 2024 Conference, Presented in September 2024 in IEEE Oceans Conference in Halifax, Canada as a Student Poster

arXiv:2410.22476 [pdf, other]

A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents

Authors: Ankan Mullick, Sombit Bose, Abhilash Nandy, Gajula Sai Chaitanya, Pawan Goyal

Abstract: In task-oriented dialogue systems, intent detection is crucial for interpreting user queries and providing appropriate responses. Existing research primarily addresses simple queries with a single intent, lacking effective systems for handling complex queries with multiple intents and extracting different intent spans. Additionally, there is a notable absence of multilingual, multi-intent datasets… ▽ More In task-oriented dialogue systems, intent detection is crucial for interpreting user queries and providing appropriate responses. Existing research primarily addresses simple queries with a single intent, lacking effective systems for handling complex queries with multiple intents and extracting different intent spans. Additionally, there is a notable absence of multilingual, multi-intent datasets. This study addresses three critical tasks: extracting multiple intent spans from queries, detecting multiple intents, and developing a multi-lingual multi-label intent dataset. We introduce a novel multi-label multi-class intent detection dataset (MLMCID-dataset) curated from existing benchmark datasets. We also propose a pointer network-based architecture (MLMCID) to extract intent spans and detect multiple intents with coarse and fine-grained labels in the form of sextuplets. Comprehensive analysis demonstrates the superiority of our pointer network-based system over baseline approaches in terms of accuracy and F1-score across various datasets. △ Less

Submitted 29 October, 2024; originally announced October 2024.

Comments: Accepted at EMNLP 2024 Findings (Long Paper)

arXiv:2410.01400 [pdf, other]

CrowdCounter: A benchmark type-specific multi-target counterspeech dataset

Authors: Punyajoy Saha, Abhilash Datta, Abhik Jana, Animesh Mukherjee

Abstract: Counterspeech presents a viable alternative to banning or suspending users for hate speech while upholding freedom of expression. However, writing effective counterspeech is challenging for moderators/users. Hence, developing suggestion tools for writing counterspeech is the need of the hour. One critical challenge in developing such a tool is the lack of quality and diversity of the responses in… ▽ More Counterspeech presents a viable alternative to banning or suspending users for hate speech while upholding freedom of expression. However, writing effective counterspeech is challenging for moderators/users. Hence, developing suggestion tools for writing counterspeech is the need of the hour. One critical challenge in developing such a tool is the lack of quality and diversity of the responses in the existing datasets. Hence, we introduce a new dataset - CrowdCounter containing 3,425 hate speech-counterspeech pairs spanning six different counterspeech types (empathy, humor, questioning, warning, shaming, contradiction), which is the first of its kind. The design of our annotation platform itself encourages annotators to write type-specific, non-redundant and high-quality counterspeech. We evaluate two frameworks for generating counterspeech responses - vanilla and type-controlled prompts - across four large language models. In terms of metrics, we evaluate the responses using relevance, diversity and quality. We observe that Flan-T5 is the best model in the vanilla framework across different models. Type-specific prompts enhance the relevance of the responses, although they might reduce the language quality. DialoGPT proves to be the best at following the instructions and generating the type-specific counterspeech accurately. △ Less

Submitted 2 October, 2024; originally announced October 2024.

Comments: 19 pages, 1 figure, 14 tables, Code available https://github.com/hate-alert/CrowdCounter

arXiv:2409.13592 [pdf, other]

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Authors: Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly

Abstract: Understanding satire and humor is a challenging task for even current Vision-Language models. In this paper, we propose the challenging tasks of Satirical Image Detection (detecting whether an image is satirical), Understanding (generating the reason behind the image being satirical), and Completion (given one half of the image, selecting the other half from 2 given options, such that the complete… ▽ More Understanding satire and humor is a challenging task for even current Vision-Language models. In this paper, we propose the challenging tasks of Satirical Image Detection (detecting whether an image is satirical), Understanding (generating the reason behind the image being satirical), and Completion (given one half of the image, selecting the other half from 2 given options, such that the complete image is satirical) and release a high-quality dataset YesBut, consisting of 2547 images, 1084 satirical and 1463 non-satirical, containing different artistic styles, to evaluate those tasks. Each satirical image in the dataset depicts a normal scenario, along with a conflicting scenario which is funny or ironic. Despite the success of current Vision-Language Models on multimodal tasks such as Visual QA and Image Captioning, our benchmarking experiments show that such models perform poorly on the proposed tasks on the YesBut Dataset in Zero-Shot Settings w.r.t both automated as well as human evaluation. Additionally, we release a dataset of 119 real, satirical photographs for further research. The dataset and code are available at https://github.com/abhi1nandy2/yesbut_dataset. △ Less

Submitted 20 September, 2024; originally announced September 2024.

Comments: EMNLP 2024 Main (Long), 18 pages, 14 figures, 12 tables

arXiv:2409.06821 [pdf, other]

Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts

Authors: Assefa Seyoum Wahd, Banafshe Felfeliyan, Yuyue Zhou, Shrimanti Ghosh, Adam McArthur, Jiechen Zhang, Jacob L. Jaremko, Abhilash Hareendranathan

Abstract: Foundation models like the segment anything model require high-quality manual prompts for medical image segmentation, which is time-consuming and requires expertise. SAM and its variants often fail to segment structures in ultrasound (US) images due to domain shift. We propose Sam2Rad, a prompt learning approach to adapt SAM and its variants for US bone segmentation without human prompts. It int… ▽ More Foundation models like the segment anything model require high-quality manual prompts for medical image segmentation, which is time-consuming and requires expertise. SAM and its variants often fail to segment structures in ultrasound (US) images due to domain shift. We propose Sam2Rad, a prompt learning approach to adapt SAM and its variants for US bone segmentation without human prompts. It introduces a prompt predictor network (PPN) with a cross-attention module to predict prompt embeddings from image encoder features. PPN outputs bounding box and mask prompts, and 256-dimensional embeddings for regions of interest. The framework allows optional manual prompting and can be trained end-to-end using parameter-efficient fine-tuning (PEFT). Sam2Rad was tested on 3 musculoskeletal US datasets: wrist (3822 images), rotator cuff (1605 images), and hip (4849 images). It improved performance across all datasets without manual prompts, increasing Dice scores by 2-7% for hip/wrist and up to 33% for shoulder data. Sam2Rad can be trained with as few as 10 labeled images and is compatible with any SAM architecture for automatic segmentation. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2408.16387 [pdf, other]

Enhancing MOTION2NX for Efficient, Scalable and Secure Image Inference using Convolutional Neural Networks

Authors: Haritha K, Ramya Burra, Srishti Mittal, Sarthak Sharma, Abhilash Venkatesh, Anshoo Tandon

Abstract: This work contributes towards the development of an efficient and scalable open-source Secure Multi-Party Computation (SMPC) protocol on machines with moderate computational resources. We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural network (CNN) inference application with semi-honest security. Our list of contributions are as follow… ▽ More This work contributes towards the development of an efficient and scalable open-source Secure Multi-Party Computation (SMPC) protocol on machines with moderate computational resources. We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural network (CNN) inference application with semi-honest security. Our list of contributions are as follows. Firstly, we enhance MOTION2NX by providing a tensorized version of several primitive functions including the Hadamard product, indicator function and argmax function. Secondly, we adapt an existing Helper node algorithm, working in tandem with the ABY2.0 protocol, for efficient convolution computation to reduce execution time and RAM usage. Thirdly, we also present a novel splitting algorithm that divides the computations at each CNN layer into multiple configurable chunks. This novel splitting algorithm, providing significant reduction in RAM usage, is of independent interest and is applicable to general SMPC protocols. △ Less

Submitted 24 October, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: 20 pages, 1 figure. arXiv admin note: text overlap with arXiv:2310.10133

arXiv:2408.16176 [pdf, other]

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Authors: M. Maruf, Arka Daw, Kazi Sajeed Mehrab, Harish Babu Manogaran, Abhilash Neog, Medha Sawhney, Mridul Khurana, James P. Balhoff, Yasin Bakis, Bahadir Altintas, Matthew J. Thompson, Elizabeth G. Campolongo, Josef C. Uyeda, Hilmar Lapp, Henry L. Bart, Paula M. Mabee, Yu Su, Wei-Lun Chao, Charles Stewart, Tanya Berger-Wolf, Wasila Dahdul, Anuj Karpatne

Abstract: Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning.… ▽ More Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images. The code and datasets for running all the analyses reported in this paper can be found at https://github.com/sammarfy/VLM4Bio. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 36 pages, 37 figures, 7 tables

arXiv:2408.04886 [pdf]

Automated PMC-based Power Modeling Methodology for Modern Mobile GPUs

Authors: Pranab Dash, Y. Charlie Hu, Abhilash Jindal

Abstract: The rise of machine learning workload on smartphones has propelled GPUs into one of the most power-hungry components of modern smartphones and elevates the need for optimizing the GPU power draw by mobile apps. Optimizing the power consumption of mobile GPUs in turn requires accurate estimation of their power draw during app execution. In this paper, we observe that the prior-art, utilization-freq… ▽ More The rise of machine learning workload on smartphones has propelled GPUs into one of the most power-hungry components of modern smartphones and elevates the need for optimizing the GPU power draw by mobile apps. Optimizing the power consumption of mobile GPUs in turn requires accurate estimation of their power draw during app execution. In this paper, we observe that the prior-art, utilization-frequency based GPU models cannot capture the diverse micro-architectural usage of modern mobile GPUs.We show that these models suffer poor modeling accuracy under diverse GPU workload, and study whether performance monitoring counter (PMC)-based models recently proposed for desktop/server GPUs can be applied to accurately model mobile GPU power. Our study shows that the PMCs that come with dominating mobile GPUs used in modern smartphones are sufficient to model mobile GPU power, but exhibit multicollinearity if used altogether. We present APGPM, the mobile GPU power modeling methodology that automatically selects an optimal set of PMCs that maximizes the GPU power model accuracy. Evaluation on two representative mobile GPUs shows that APGPM-generated GPU power models reduce the MAPE modeling error of prior-art by 1.95x to 2.66x (i.e., by 11.3% to 15.4%) while using only 4.66% to 20.41% of the total number of available PMCs. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2407.14202 [pdf, other]

SHS: Scorpion Hunting Strategy Swarm Algorithm

Authors: Abhilash Singh, Seyed Muhammad Hossein Mousavi, Kumar Gaurav

Abstract: We introduced the Scorpion Hunting Strategy (SHS), a novel population-based, nature-inspired optimisation algorithm. This algorithm draws inspiration from the hunting strategy of scorpions, which identify, locate, and capture their prey using the alpha and beta vibration operators. These operators control the SHS algorithm's exploitation and exploration abilities. To formulate an optimisation meth… ▽ More We introduced the Scorpion Hunting Strategy (SHS), a novel population-based, nature-inspired optimisation algorithm. This algorithm draws inspiration from the hunting strategy of scorpions, which identify, locate, and capture their prey using the alpha and beta vibration operators. These operators control the SHS algorithm's exploitation and exploration abilities. To formulate an optimisation method, we mathematically simulate these dynamic events and behaviors. We evaluate the effectiveness of the SHS algorithm by employing 20 benchmark functions (including 10 conventional and 10 CEC2020 functions), using both qualitative and quantitative analyses. Through a comparative analysis with 12 state-of-the-art meta-heuristic algorithms, we demonstrate that the proposed SHS algorithm yields exceptionally promising results. These findings are further supported by statistically significant results obtained through the Wilcoxon rank sum test. Additionally, the ranking of SHS, as determined by the average rank derived from the Friedman test, positions it at the forefront when compared to other algorithms. Going beyond theoretical validation, we showcase the practical utility of the SHS algorithm by applying it to six distinct real-world optimisation tasks. These applications illustrate the algorithm's potential in addressing complex optimisation challenges. In summary, this work not only introduces the innovative SHS algorithm but also substantiates its effectiveness and versatility through rigorous benchmarking and real-world problem-solving scenarios. △ Less

Submitted 30 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.08027 [pdf, other]

Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images

Authors: Kazi Sajeed Mehrab, M. Maruf, Arka Daw, Abhilash Neog, Harish Babu Manogaran, Mridul Khurana, Zhenyang Feng, Bahadir Altintas, Yasin Bakis, Elizabeth G Campolongo, Matthew J Thompson, Xiaojun Wang, Hilmar Lapp, Tanya Berger-Wolf, Paula Mabee, Henry Bart, Wei-Lun Chao, Wasila M Dahdul, Anuj Karpatne

Abstract: We introduce Fish-Visual Trait Analysis (Fish-Vista), the first organismal image dataset designed for the analysis of visual traits of aquatic species directly from images using problem formulations in computer vision. Fish-Vista contains 69,126 annotated images spanning 4,154 fish species, curated and organized to serve three downstream tasks of species classification, trait identification, and t… ▽ More We introduce Fish-Visual Trait Analysis (Fish-Vista), the first organismal image dataset designed for the analysis of visual traits of aquatic species directly from images using problem formulations in computer vision. Fish-Vista contains 69,126 annotated images spanning 4,154 fish species, curated and organized to serve three downstream tasks of species classification, trait identification, and trait segmentation. Our work makes two key contributions. First, we perform a fully reproducible data processing pipeline to process images sourced from various museum collections. We annotate these images with carefully curated labels from biological databases and manual annotations to create an AI-ready dataset of visual traits, contributing to the advancement of AI in biodiversity science. Second, our proposed downstream tasks offer fertile grounds for novel computer vision research in addressing a variety of challenges such as long-tailed distributions, out-of-distribution generalization, learning with weak labels, explainable AI, and segmenting small objects. We benchmark the performance of several existing methods for our proposed tasks to expose future research opportunities in AI for biodiversity science problems involving visual traits. △ Less

Submitted 27 February, 2025; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Preprint. Accepted to CVPR 2025

arXiv:2407.04560 [pdf, other]

Real Time Emotion Analysis Using Deep Learning for Education, Entertainment, and Beyond

Authors: Abhilash Khuntia, Shubham Kale

Abstract: The significance of emotion detection is increasing in education, entertainment, and various other domains. We are developing a system that can identify and transform facial expressions into emojis to provide immediate feedback.The project consists of two components. Initially, we will employ sophisticated image processing techniques and neural networks to construct a deep learning model capable o… ▽ More The significance of emotion detection is increasing in education, entertainment, and various other domains. We are developing a system that can identify and transform facial expressions into emojis to provide immediate feedback.The project consists of two components. Initially, we will employ sophisticated image processing techniques and neural networks to construct a deep learning model capable of precisely categorising facial expressions. Next, we will develop a basic application that records live video using the camera on your device. The app will utilise a sophisticated model to promptly analyse facial expressions and promptly exhibit corresponding emojis.Our objective is to develop a dynamic tool that integrates deep learning and real-time video processing for the purposes of online education, virtual events, gaming, and enhancing user experience. This tool enhances interactions and introduces novel emotional intelligence technologies. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 8 pages, 23 figures

arXiv:2407.03305 [pdf, other]

Advanced Smart City Monitoring: Real-Time Identification of Indian Citizen Attributes

Authors: Shubham Kale, Shashank Sharma, Abhilash Khuntia

Abstract: This project focuses on creating a smart surveillance system for Indian cities that can identify and analyze people's attributes in real time. Using advanced technologies like artificial intelligence and machine learning, the system can recognize attributes such as upper body color, what the person is wearing, accessories they are wearing, headgear, etc., and analyze behavior through cameras insta… ▽ More This project focuses on creating a smart surveillance system for Indian cities that can identify and analyze people's attributes in real time. Using advanced technologies like artificial intelligence and machine learning, the system can recognize attributes such as upper body color, what the person is wearing, accessories they are wearing, headgear, etc., and analyze behavior through cameras installed around the city. △ Less

Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: 6 pages , 8 figure , changed title and some alignment issue were resolved, but other contents remains same

arXiv:2406.09722 [pdf, other]

doi 10.1109/ACCESS.2024.3507280

Cross-view geo-localization: a survey

Authors: Abhilash Durgam, Sidike Paheding, Vikas Dhiman, Vijay Devabhaktuni

Abstract: Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learni… ▽ More Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets and the advancements in machine learning techniques. This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain, with a focus on feature-based and deep learning strategies. Feature-based methods capitalize on unique features to establish correspondences across disparate viewpoints, whereas deep learning-based methodologies deploy convolutional neural networks to embed view-invariant attributes. This work also delineates the multifaceted challenges encountered in cross-view geo-localization, such as variations in viewpoints and illumination, the occurrence of occlusions, and it elucidates innovative solutions that have been formulated to tackle these issues. Furthermore, we delineate benchmark datasets and relevant evaluation metrics, and also perform a comparative analysis of state-of-the-art techniques. Finally, we conclude the paper with a discussion on prospective avenues for future research and the burgeoning applications of cross-view geo-localization in an intricately interconnected global landscape. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2404.17912 [pdf, other]

SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models

Authors: Manav Nitin Kapadnis, Sohan Patnaik, Abhilash Nandy, Sourjyadip Ray, Pawan Goyal, Debdoot Sheet

Abstract: Radiology Report Generation (R2Gen) demonstrates how Multi-modal Large Language Models (MLLMs) can automate the creation of accurate and coherent radiological reports. Existing methods often hallucinate details in text-based reports that don't accurately reflect the image content. To mitigate this, we introduce a novel strategy, SERPENT-VLM (SElf Refining Radiology RePort GENeraTion using Vision L… ▽ More Radiology Report Generation (R2Gen) demonstrates how Multi-modal Large Language Models (MLLMs) can automate the creation of accurate and coherent radiological reports. Existing methods often hallucinate details in text-based reports that don't accurately reflect the image content. To mitigate this, we introduce a novel strategy, SERPENT-VLM (SElf Refining Radiology RePort GENeraTion using Vision Language Models), which improves the R2Gen task by integrating a self-refining mechanism into the MLLM framework. We employ a unique self-supervised loss that leverages similarity between pooled image representations and the contextual representations of the generated radiological text, alongside the standard Causal Language Modeling objective, to refine image-text representations. This allows the model to scrutinize and align the generated text through dynamic interaction between a given image and the generated text, therefore reducing hallucination and continuously enhancing nuanced report generation. SERPENT-VLM outperforms existing baselines such as LLaVA-Med, BiomedGPT, etc., achieving SoTA performance on the IU X-ray and Radiology Objects in COntext (ROCO) datasets, and also proves to be robust against noisy images. A qualitative case study emphasizes the significant advancements towards more sophisticated MLLM frameworks for R2Gen, opening paths for further research into self-supervised refinement in the medical imaging domain. △ Less

Submitted 18 July, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures, 4 tables, Accepted as oral at Clinical NLP workshop at NAACL 2024

arXiv:2404.04676 [pdf, other]

Order-Based Pre-training Strategies for Procedural Text Understanding

Authors: Abhilash Nandy, Yash Kulkarni, Pawan Goyal, Niloy Ganguly

Abstract: In this paper, we propose sequence-based pretraining methods to enhance procedural understanding in natural language processing. Procedural text, containing sequential instructions to accomplish a task, is difficult to understand due to the changing attributes of entities in the context. We focus on recipes, which are commonly represented as ordered instructions, and use this order as a supervisio… ▽ More In this paper, we propose sequence-based pretraining methods to enhance procedural understanding in natural language processing. Procedural text, containing sequential instructions to accomplish a task, is difficult to understand due to the changing attributes of entities in the context. We focus on recipes, which are commonly represented as ordered instructions, and use this order as a supervision signal. Our work is one of the first to compare several 'order as-supervision' transformer pre-training methods, including Permutation Classification, Embedding Regression, and Skip-Clip, and shows that these methods give improved results compared to the baselines and SoTA LLMs on two downstream Entity-Tracking datasets: NPN-Cooking dataset in recipe domain and ProPara dataset in open domain. Our proposed methods address the non-trivial Entity Tracking Task that requires prediction of entity states across procedure steps, which requires understanding the order of steps. These methods show an improvement over the best baseline by 1.6% and 7-9% on NPN-Cooking and ProPara Datasets respectively across metrics. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 8 pages (Accepted for publication at NAACL 2024 (Main Conference))

arXiv:2404.01329 [pdf, other]

Unraveling the Dynamics of Television Debates and Social Media Engagement: Insights from an Indian News Show

Authors: Kiran Garimella, Abhilash Datta

Abstract: The relationship between television shows and social media has become increasingly intertwined in recent years. Social media platforms, particularly Twitter, have emerged as significant sources of public opinion and discourse on topics discussed in television shows. In India, news debates leverage the popularity of social media to promote hashtags and engage users in discussions and debates on a d… ▽ More The relationship between television shows and social media has become increasingly intertwined in recent years. Social media platforms, particularly Twitter, have emerged as significant sources of public opinion and discourse on topics discussed in television shows. In India, news debates leverage the popularity of social media to promote hashtags and engage users in discussions and debates on a daily basis. This paper focuses on the analysis of one of India's most prominent and widely-watched TV news debate shows: "Arnab Goswami-The Debate". The study examines the content of the show by analyzing the hashtags used to promote it and the social media data corresponding to these hashtags. The findings reveal that the show exhibits a strong bias towards the ruling Bharatiya Janata Party (BJP), with over 60% of the debates featuring either pro-BJP or anti-opposition content. Social media support for the show primarily comes from BJP supporters. Notably, BJP leaders and influencers play a significant role in promoting the show on social media, leveraging their existing networks and resources to artificially trend specific hashtags. Furthermore, the study uncovers a reciprocal flow of information between the TV show and social media. We find evidence that the show's choice of topics is linked to social media posts made by party workers, suggesting a dynamic interplay between traditional media and online platforms. By exploring the complex interaction between television debates and social media support, this study contributes to a deeper understanding of the evolving relationship between these two domains in the digital age. The findings hold implications for media researchers and practitioners, offering insights into the ways in which social media can influence traditional media and vice versa. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: Accepted at ICWSM 2024. Please cite the ICWSM version

arXiv:2403.04670 [pdf, other]

End-to-end Conditional Robust Optimization

Authors: Abhilash Chenreddy, Erick Delage

Abstract: The field of Contextual Optimization (CO) integrates machine learning and optimization to solve decision making problems under uncertainty. Recently, a risk sensitive variant of CO, known as Conditional Robust Optimization (CRO), combines uncertainty quantification with robust optimization in order to promote safety and reliability in high stake applications. Exploiting modern differentiable optim… ▽ More The field of Contextual Optimization (CO) integrates machine learning and optimization to solve decision making problems under uncertainty. Recently, a risk sensitive variant of CO, known as Conditional Robust Optimization (CRO), combines uncertainty quantification with robust optimization in order to promote safety and reliability in high stake applications. Exploiting modern differentiable optimization methods, we propose a novel end-to-end approach to train a CRO model in a way that accounts for both the empirical risk of the prescribed decisions and the quality of conditional coverage of the contextual uncertainty set that supports them. While guarantees of success for the latter objective are impossible to obtain from the point of view of conformal prediction theory, high quality conditional coverage is achieved empirically by ingeniously employing a logistic regression differentiable layer within the calculation of coverage quality in our training loss. We show that the proposed training algorithms produce decisions that outperform the traditional estimate then optimize approaches. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.14300 [pdf, other]

A Simple Framework Uniting Visual In-context Learning with Masked Image Modeling to Improve Ultrasound Segmentation

Authors: Yuyue Zhou, Banafshe Felfeliyan, Shrimanti Ghosh, Jessica Knight, Fatima Alves-Pereira, Christopher Keen, Jessica Küpper, Abhilash Rakkunedeth Hareendranathan, Jacob L. Jaremko

Abstract: Conventional deep learning models deal with images one-by-one, requiring costly and time-consuming expert labeling in the field of medical imaging, and domain-specific restriction limits model generalizability. Visual in-context learning (ICL) is a new and exciting area of research in computer vision. Unlike conventional deep learning, ICL emphasizes the model's ability to adapt to new tasks based… ▽ More Conventional deep learning models deal with images one-by-one, requiring costly and time-consuming expert labeling in the field of medical imaging, and domain-specific restriction limits model generalizability. Visual in-context learning (ICL) is a new and exciting area of research in computer vision. Unlike conventional deep learning, ICL emphasizes the model's ability to adapt to new tasks based on given examples quickly. Inspired by MAE-VQGAN, we proposed a new simple visual ICL method called SimICL, combining visual ICL pairing images with masked image modeling (MIM) designed for self-supervised learning. We validated our method on bony structures segmentation in a wrist ultrasound (US) dataset with limited annotations, where the clinical objective was to segment bony structures to help with further fracture detection. We used a test set containing 3822 images from 18 patients for bony region segmentation. SimICL achieved an remarkably high Dice coeffient (DC) of 0.96 and Jaccard Index (IoU) of 0.92, surpassing state-of-the-art segmentation and visual ICL models (a maximum DC 0.86 and IoU 0.76), with SimICL DC and IoU increasing up to 0.10 and 0.16. This remarkably high agreement with limited manual annotations indicates SimICL could be used for training AI models even on small US datasets. This could dramatically decrease the human expert time required for image labeling compared to conventional approaches, and enhance the real-world use of AI assistance in US image analysis. △ Less

Submitted 8 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.06331 [pdf]

Application Of Vision-Language Models For Assessing Osteoarthritis Disease Severity

Authors: Banafshe Felfeliyan, Yuyue Zhou, Shrimanti Ghosh, Jessica Kupper, Shaobo Liu, Abhilash Hareendranathan, Jacob L. Jaremko

Abstract: Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods. Current radiographic assessments are time consuming and prone to variability, prompting the need for automated solutions. The existing deep learning models for OA assessment are unimodal single task systems and they don't incorporate relevant text information such as patient demographics, disease history, or… ▽ More Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods. Current radiographic assessments are time consuming and prone to variability, prompting the need for automated solutions. The existing deep learning models for OA assessment are unimodal single task systems and they don't incorporate relevant text information such as patient demographics, disease history, or physician reports. This study investigates employing Vision Language Processing (VLP) models to predict OA severity using Xray images and corresponding reports. Our method leverages Xray images of the knee and diverse report templates generated from tabular OA scoring values to train a CLIP (Contrastive Language Image PreTraining) style VLP model. Furthermore, we incorporate additional contrasting captions to enforce the model to discriminate between positive and negative reports. Results demonstrate the efficacy of these models in learning text image representations and their contextual relationships, showcase potential advancement in OA assessment, and establish a foundation for specialized vision language models in medical contexts. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2311.02216 [pdf, other]

Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data

Authors: Mubashara Akhtar, Abhilash Shankarampeta, Vivek Gupta, Arpit Patil, Oana Cocarascu, Elena Simperl

Abstract: Numbers are crucial for various real-world domains such as finance, economics, and science. Thus, understanding and reasoning with numbers are essential skills for language models to solve different tasks. While different numerical benchmarks have been introduced in recent years, they are limited to specific numerical aspects mostly. In this paper, we propose a hierarchical taxonomy for numerical… ▽ More Numbers are crucial for various real-world domains such as finance, economics, and science. Thus, understanding and reasoning with numbers are essential skills for language models to solve different tasks. While different numerical benchmarks have been introduced in recent years, they are limited to specific numerical aspects mostly. In this paper, we propose a hierarchical taxonomy for numerical reasoning skills with more than ten reasoning types across four levels: representation, number sense, manipulation, and complex reasoning. We conduct a comprehensive evaluation of state-of-the-art models to identify reasoning challenges specific to them. Henceforth, we develop a diverse set of numerical probes employing a semi-automated approach. We focus on the tabular Natural Language Inference (TNLI) task as a case study and measure models' performance shifts. Our results show that no model consistently excels across all numerical reasoning types. Among the probed models, FlanT5 (few-/zero-shot) and GPT-3.5 (few-shot) demonstrate strong overall numerical reasoning skills compared to other models. Label-flipping probes indicate that models often exploit dataset artifacts to predict the correct labels. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: Accepted at EMNLP 2023 (Findings)

arXiv:2310.16048 [pdf, ps, other]

AI Alignment and Social Choice: Fundamental Limitations and Policy Implications

Authors: Abhilash Mishra

Abstract: Aligning AI agents to human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? Reinforcement learning with human feedback (RLHF) has emerged as the key framework for AI alignment. RLHF uses feedback from human reinforcers to fine-tune outputs; all widely deployed large language models (LLMs) use RLHF to alig… ▽ More Aligning AI agents to human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? Reinforcement learning with human feedback (RLHF) has emerged as the key framework for AI alignment. RLHF uses feedback from human reinforcers to fine-tune outputs; all widely deployed large language models (LLMs) use RLHF to align their outputs to human values. It is critical to understand the limitations of RLHF and consider policy challenges arising from these limitations. In this paper, we investigate a specific challenge in building RLHF systems that respect democratic norms. Building on impossibility results in social choice theory, we show that, under fairly broad assumptions, there is no unique voting protocol to universally align AI systems using RLHF through democratic processes. Further, we show that aligning AI agents with the values of all individuals will always violate certain private ethical preferences of an individual user i.e., universal AI alignment using RLHF is impossible. We discuss policy implications for the governance of AI systems built using RLHF: first, the need for mandating transparent voting rules to hold model builders accountable. Second, the need for model builders to focus on developing AI agents that are narrowly aligned to specific user groups. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 10 pages, no figures

arXiv:2310.14326 [pdf, other]

CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text

Authors: Abhilash Nandy, Manav Nitin Kapadnis, Pawan Goyal, Niloy Ganguly

Abstract: In this paper, we propose CLMSM, a domain-specific, continual pre-training framework, that learns from a large set of procedural recipes. CLMSM uses a Multi-Task Learning Framework to optimize two objectives - a) Contrastive Learning using hard triplets to learn fine-grained differences across entities in the procedures, and b) a novel Mask-Step Modelling objective to learn step-wise context of a… ▽ More In this paper, we propose CLMSM, a domain-specific, continual pre-training framework, that learns from a large set of procedural recipes. CLMSM uses a Multi-Task Learning Framework to optimize two objectives - a) Contrastive Learning using hard triplets to learn fine-grained differences across entities in the procedures, and b) a novel Mask-Step Modelling objective to learn step-wise context of a procedure. We test the performance of CLMSM on the downstream tasks of tracking entities and aligning actions between two procedures on three datasets, one of which is an open-domain dataset not conforming with the pre-training dataset. We show that CLMSM not only outperforms baselines on recipes (in-domain) but is also able to generalize to open-domain procedural NLP tasks. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP Findings 2023, 14 pages, 4 figures

arXiv:2310.05567 [pdf, other]

doi 10.1016/j.oceaneng.2023.116011

Collision Avoidance for Autonomous Surface Vessels using Novel Artificial Potential Fields

Authors: Aditya Kailas Jadhav, Anantha Raj Pandi, Abhilash Somayajula

Abstract: As the demand for transportation through waterways continues to rise, the number of vessels plying the waters has correspondingly increased. This has resulted in a greater number of accidents and collisions between ships, some of which lead to significant loss of life and financial losses. Research has shown that human error is a major factor responsible for such incidents. The maritime industry i… ▽ More As the demand for transportation through waterways continues to rise, the number of vessels plying the waters has correspondingly increased. This has resulted in a greater number of accidents and collisions between ships, some of which lead to significant loss of life and financial losses. Research has shown that human error is a major factor responsible for such incidents. The maritime industry is constantly exploring newer approaches to autonomy to mitigate this issue. This study presents the use of novel Artificial Potential Fields (APFs) to perform obstacle and collision avoidance in marine environments. This study highlights the advantage of harmonic functions over traditional functions in modeling potential fields. With a modification, the method is extended to effectively avoid dynamic obstacles while adhering to COLREGs. Improved performance is observed as compared to the traditional potential fields and also against the popular velocity obstacle approach. A comprehensive statistical analysis is also performed through Monte Carlo simulations in different congested environments that emulate real traffic conditions to demonstrate robustness of the approach. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 28 pages, 30 figures

Journal ref: Ocean Engineering, 288 (2023), 116011

arXiv:2310.00205 [pdf, other]

Finding 709 Defects in 258 Projects: An Experience Report on Applying CodeQL to Open-Source Embedded Software (Experience Paper) -- Extended Report

Authors: Mingjie Shen, Akul Abhilash Pillai, Brian A. Yuan, James C. Davis, Aravind Machiry

Abstract: In this experience paper, we report on a large-scale empirical study of Static Application Security Testing (SAST) in Open-Source Embedded Software (EMBOSS) repositories. We collected a corpus of 258 of the most popular EMBOSS projects, and then measured their use of SAST tools via program analysis and a survey (N=25) of their developers. Advanced SAST tools are rarely used -- only 3% of projects… ▽ More In this experience paper, we report on a large-scale empirical study of Static Application Security Testing (SAST) in Open-Source Embedded Software (EMBOSS) repositories. We collected a corpus of 258 of the most popular EMBOSS projects, and then measured their use of SAST tools via program analysis and a survey (N=25) of their developers. Advanced SAST tools are rarely used -- only 3% of projects go beyond trivial compiler analyses. Developers cited the perception of ineffectiveness and false positives as reasons for limited adoption. Motivated by this deficit, we applied the state-of-the-art (SOTA) CodeQL SAST tool and measured its ease of use and actual effectiveness. Across the 258 projects, CodeQL reported 709 true defects with a false positive rate of 34%. There were 535 (75%) likely security vulnerabilities, including in major projects maintained by Microsoft, Amazon, and the Apache Foundation. EMBOSS engineers have confirmed 376 (53%) of these defects, mainly by accepting our pull requests. Two CVEs were issued. Based on these results, we proposed pull requests to include our workflows as part of EMBOSS Continuous Integration (CI) pipelines, 37 (71% of active repositories) of these are already merged. In summary, we urge EMBOSS engineers to adopt the current generation of SAST tools, which offer low false positive rates and are effective at finding security-relevant defects. △ Less

Submitted 25 April, 2025; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: This is the extended version of: Mingjie Shen, Akul Abhilash Pillai, Brian A. Yuan, James C. Davis, and Aravind Machiry. 2025. Finding 709 Defects in 258 Projects: An Experience Report on Applying CodeQL to Open-Source Embedded Software (Experience Paper). Proc. ACM Softw. Eng. 2, ISSTA, Article ISSTA048 (July 2025), 24 pages. https://doi.org/10.1145/3728923

arXiv:2309.09490 [pdf, other]

Self-supervised TransUNet for Ultrasound regional segmentation of the distal radius in children

Authors: Yuyue Zhou, Jessica Knight, Banafshe Felfeliyan, Christopher Keen, Abhilash Rakkunedeth Hareendranathan, Jacob L. Jaremko

Abstract: Supervised deep learning offers great promise to automate analysis of medical images from segmentation to diagnosis. However, their performance highly relies on the quality and quantity of the data annotation. Meanwhile, curating large annotated datasets for medical images requires a high level of expertise, which is time-consuming and expensive. Recently, to quench the thirst for large data sets… ▽ More Supervised deep learning offers great promise to automate analysis of medical images from segmentation to diagnosis. However, their performance highly relies on the quality and quantity of the data annotation. Meanwhile, curating large annotated datasets for medical images requires a high level of expertise, which is time-consuming and expensive. Recently, to quench the thirst for large data sets with high-quality annotation, self-supervised learning (SSL) methods using unlabeled domain-specific data, have attracted attention. Therefore, designing an SSL method that relies on minimal quantities of labeled data has far-reaching significance in medical images. This paper investigates the feasibility of deploying the Masked Autoencoder for SSL (SSL-MAE) of TransUNet, for segmenting bony regions from children's wrist ultrasound scans. We found that changing the embedding and loss function in SSL-MAE can produce better downstream results compared to the original SSL-MAE. In addition, we determined that only pretraining TransUNet embedding and encoder with SSL-MAE does not work as well as TransUNet without SSL-MAE pretraining on downstream segmentation tasks. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.05497 [pdf, other]

Personality Detection and Analysis using Twitter Data

Authors: Abhilash Datta, Souvic Chakraborty, Animesh Mukherjee

Abstract: Personality types are important in various fields as they hold relevant information about the characteristics of a human being in an explainable format. They are often good predictors of a person's behaviors in a particular environment and have applications ranging from candidate selection to marketing and mental health. Recently automatic detection of personality traits from texts has gained sign… ▽ More Personality types are important in various fields as they hold relevant information about the characteristics of a human being in an explainable format. They are often good predictors of a person's behaviors in a particular environment and have applications ranging from candidate selection to marketing and mental health. Recently automatic detection of personality traits from texts has gained significant attention in computational linguistics. Most personality detection and analysis methods have focused on small datasets making their experimental observations often limited. To bridge this gap, we focus on collecting and releasing the largest automatically curated dataset for the research community which has 152 million tweets and 56 thousand data points for the Myers-Briggs personality type (MBTI) prediction task. We perform a series of extensive qualitative and quantitative studies on our dataset to analyze the data patterns in a better way and infer conclusions. We show how our intriguing analysis results often follow natural intuition. We also perform a series of ablation studies to show how the baselines perform for our dataset. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Submitted to ASONAM 2023

arXiv:2307.05911 [pdf, other]

Grain and Grain Boundary Segmentation using Machine Learning with Real and Generated Datasets

Authors: Peter Warren, Nandhini Raju, Abhilash Prasad, Shajahan Hossain, Ramesh Subramanian, Jayanta Kapat, Navin Manjooran, Ranajay Ghosh

Abstract: We report significantly improved accuracy of grain boundary segmentation using Convolutional Neural Networks (CNN) trained on a combination of real and generated data. Manual segmentation is accurate but time-consuming, and existing computational methods are faster but often inaccurate. To combat this dilemma, machine learning models can be used to achieve the accuracy of manual segmentation and h… ▽ More We report significantly improved accuracy of grain boundary segmentation using Convolutional Neural Networks (CNN) trained on a combination of real and generated data. Manual segmentation is accurate but time-consuming, and existing computational methods are faster but often inaccurate. To combat this dilemma, machine learning models can be used to achieve the accuracy of manual segmentation and have the efficiency of a computational method. An extensive dataset of from 316L stainless steel samples is additively manufactured, prepared, polished, etched, and then microstructure grain images were systematically collected. Grain segmentation via existing computational methods and manual (by-hand) were conducted, to create "real" training data. A Voronoi tessellation pattern combined with random synthetic noise and simulated defects, is developed to create a novel artificial grain image fabrication method. This provided training data supplementation for data-intensive machine learning methods. The accuracy of the grain measurements from microstructure images segmented via computational methods and machine learning methods proposed in this work are calculated and compared to provide much benchmarks in grain segmentation. Over 400 images of the microstructure of stainless steel samples were manually segmented for machine learning training applications. This data and the artificial data is available on Kaggle. △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2306.17173 [pdf, other]

Photon: A Cross Platform P2P Data Transfer Application

Authors: Abhilash Shreedhar Hegde, Amruta Narayana Hegde, Adeep Krishna Keelar, Ananya Mathur

Abstract: Modern computing requires efficient and dependable data transport. Current solutions like Bluetooth, SMS (Short Message Service), and Email have their restrictions on efficiency, file size, compatibility, and cost. In order to facilitate direct communication and resource sharing amongst linked devices, this research study offers a cross-platform peer-to-peer (P2P) data transmission solution that t… ▽ More Modern computing requires efficient and dependable data transport. Current solutions like Bluetooth, SMS (Short Message Service), and Email have their restrictions on efficiency, file size, compatibility, and cost. In order to facilitate direct communication and resource sharing amongst linked devices, this research study offers a cross-platform peer-to-peer (P2P) data transmission solution that takes advantage of P2P networks' features. The system enables cost-effective and high-performance data transport by using the compute, storage, and network resources of the participating devices. Simple file sharing, adaptability, dependability, and high performance are some of the important benefits. The examination of the suggested solution is presented in this paper and includes discussion of the P2P architecture, data transfer mechanisms, performance assessment, implementation issues, security concerns, and the potential difficulties that needs to be addressed. The research intends to validate the efficacy and potential of the suggested cross-platform P2P data transfer solution, delivering better efficiency and dependability for users across various platforms, through practical investigations and comparisons with existing approaches. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.10374 [pdf, ps, other]

A Survey of Contextual Optimization Methods for Decision Making under Uncertainty

Authors: Utsav Sadana, Abhilash Chenreddy, Erick Delage, Alexandre Forel, Emma Frejinger, Thibaut Vidal

Abstract: Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty. This gave rise to the field of contextual optimization, under which data-driven procedures are developed to prescribe actions to the decision-maker that make the b… ▽ More Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty. This gave rise to the field of contextual optimization, under which data-driven procedures are developed to prescribe actions to the decision-maker that make the best use of the most recently updated information. A large variety of models and methods have been presented in both OR and ML literature under a variety of names, including data-driven optimization, prescriptive optimization, predictive stochastic programming, policy optimization, (smart) predict/estimate-then-optimize, decision-focused learning, (task-based) end-to-end learning/forecasting/optimization, etc. Focusing on single and two-stage stochastic programming problems, this review article identifies three main frameworks for learning policies from data and discusses their strengths and limitations. We present the existing models and methods under a uniform notation and terminology and classify them according to the three main frameworks identified. Our objective with this survey is to both strengthen the general understanding of this active field of research and stimulate further theoretical and algorithmic advancements in integrating ML and stochastic programming. △ Less

Submitted 2 February, 2024; v1 submitted 17 June, 2023; originally announced June 2023.

arXiv:2306.06190 [pdf, other]

$FastDoc$: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy

Authors: Abhilash Nandy, Manav Nitin Kapadnis, Sohan Patnaik, Yash Parag Butala, Pawan Goyal, Niloy Ganguly

Abstract: In this paper, we propose $FastDoc$ (Fast Continual Pre-training Technique using Document Level Metadata and Taxonomy), a novel, compute-efficient framework that utilizes Document metadata and Domain-Specific Taxonomy as supervision signals to continually pre-train transformer encoder on a domain-specific corpus. The main innovation is that during domain-specific pretraining, an open-domain encode… ▽ More In this paper, we propose $FastDoc$ (Fast Continual Pre-training Technique using Document Level Metadata and Taxonomy), a novel, compute-efficient framework that utilizes Document metadata and Domain-Specific Taxonomy as supervision signals to continually pre-train transformer encoder on a domain-specific corpus. The main innovation is that during domain-specific pretraining, an open-domain encoder is continually pre-trained using sentence-level embeddings as inputs (to accommodate long documents), however, fine-tuning is done with token-level embeddings as inputs to this encoder. We perform such domain-specific pre-training on three different domains namely customer support, scientific, and legal domains, and compare performance on 6 different downstream tasks and 9 different datasets. The novel use of document-level supervision along with sentence-level embedding input for pre-training reduces pre-training compute by around $1,000$, $4,500$, and $500$ times compared to MLM and/or NSP in Customer Support, Scientific, and Legal Domains, respectively. The reduced training time does not lead to a deterioration in performance. In fact we show that $FastDoc$ either outperforms or performs on par with several competitive transformer-based baselines in terms of character-level F1 scores and other automated metrics in the Customer Support, Scientific, and Legal Domains. Moreover, reduced training aids in mitigating the risk of catastrophic forgetting. Thus, unlike baselines, $FastDoc$ shows a negligible drop in performance on open domain. △ Less

Submitted 1 November, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: Accepted to Transactions on Machine Learning Research (TMLR), 36 pages, 8 figures

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2305.19271 [pdf, other]

Concise Answers to Complex Questions: Summarization of Long-form Answers

Authors: Abhilash Potluri, Fangyuan Xu, Eunsol Choi

Abstract: Long-form question answering systems provide rich information by presenting paragraph-level answers, often containing optional background or auxiliary information. While such comprehensive answers are helpful, not all information is required to answer the question (e.g. users with domain knowledge do not need an explanation of background). Can we provide a concise version of the answer by summariz… ▽ More Long-form question answering systems provide rich information by presenting paragraph-level answers, often containing optional background or auxiliary information. While such comprehensive answers are helpful, not all information is required to answer the question (e.g. users with domain knowledge do not need an explanation of background). Can we provide a concise version of the answer by summarizing it, while still addressing the question? We conduct a user study on summarized answers generated from state-of-the-art models and our newly proposed extract-and-decontextualize approach. We find a large proportion of long-form answers (over 90%) in the ELI5 domain can be adequately summarized by at least one system, while complex and implicit answers are challenging to compress. We observe that decontextualization improves the quality of the extractive summary, exemplifying its potential in the summarization task. To promote future work, we provide an extractive summarization dataset covering 1K long-form answers and our user study annotations. Together, we present the first study on summarizing long-form answers, taking a step forward for QA agents that can provide answers at multiple granularities. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: ACL 2023 Long Paper

arXiv:2303.10782 [pdf, ps, other]

On the Importance of Signer Overlap for Sign Language Detection

Authors: Abhilash Pal, Stephan Huber, Cyrine Chaabani, Alessandro Manzotti, Oscar Koller

Abstract: Sign language detection, identifying if someone is signing or not, is becoming crucially important for its applications in remote conferencing software and for selecting useful sign data for training sign language recognition or translation tasks. We argue that the current benchmark data sets for sign language detection estimate overly positive results that do not generalize well due to signer ove… ▽ More Sign language detection, identifying if someone is signing or not, is becoming crucially important for its applications in remote conferencing software and for selecting useful sign data for training sign language recognition or translation tasks. We argue that the current benchmark data sets for sign language detection estimate overly positive results that do not generalize well due to signer overlap between train and test partitions. We quantify this with a detailed analysis of the effect of signer overlap on current sign detection benchmark data sets. Comparing accuracy with and without overlap on the DGS corpus and Signing in the Wild, we observed a relative decrease in accuracy of 4.17% and 6.27%, respectively. Furthermore, we propose new data set partitions that are free of overlap and allow for more realistic performance assessment. We hope this work will contribute to improving the accuracy and generalization of sign language detection systems. △ Less

Submitted 19 March, 2023; originally announced March 2023.

arXiv:2212.04781 [pdf, ps, other]

doi 10.1109/CSR54599.2022.9850338

A Bayesian Model Combination-based approach to Active Malware Analysis

Authors: Abhilash Hota, Jurgen Schonwalder

Abstract: Active Malware Analysis involves modeling malware behavior by executing actions to trigger responses and explore multiple execution paths. One of the aims is making the action selection more efficient. This paper treats Active Malware Analysis as a Bayes-Active Markov Decision Process and uses a Bayesian Model Combination approach to train an analyzer agent. We show an improvement in performance a… ▽ More Active Malware Analysis involves modeling malware behavior by executing actions to trigger responses and explore multiple execution paths. One of the aims is making the action selection more efficient. This paper treats Active Malware Analysis as a Bayes-Active Markov Decision Process and uses a Bayesian Model Combination approach to train an analyzer agent. We show an improvement in performance against other Bayesian and stochastic approaches to Active Malware Analysis. △ Less

Submitted 9 December, 2022; originally announced December 2022.

arXiv:2210.13326 [pdf, other]

Clean Text and Full-Body Transformer: Microsoft's Submission to the WMT22 Shared Task on Sign Language Translation

Authors: Subhadeep Dey, Abhilash Pal, Cyrine Chaabani, Oscar Koller

Abstract: This paper describes Microsoft's submission to the first shared task on sign language translation at WMT 2022, a public competition tackling sign language to spoken language translation for Swiss German sign language. The task is very challenging due to data scarcity and an unprecedented vocabulary size of more than 20k words on the target side. Moreover, the data is taken from real broadcast news… ▽ More This paper describes Microsoft's submission to the first shared task on sign language translation at WMT 2022, a public competition tackling sign language to spoken language translation for Swiss German sign language. The task is very challenging due to data scarcity and an unprecedented vocabulary size of more than 20k words on the target side. Moreover, the data is taken from real broadcast news, includes native signing and covers scenarios of long videos. Motivated by recent advances in action recognition, we incorporate full body information by extracting features from a pre-trained I3D model and applying a standard transformer network. The accuracy of the system is further improved by applying careful data cleaning on the target text. We obtain BLEU scores of 0.6 and 0.78 on the test and dev set respectively, which is the best score among the participants of the shared task. Also in the human evaluation the submission reaches the first place. The BLEU score is further improved to 1.08 on the dev set by applying features extracted from a lip reading model. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: accepted for publication at WMT2022

arXiv:2210.12259 [pdf, other]

Enhancing Tabular Reasoning with Pattern Exploiting Training

Authors: Abhilash Reddy Shankarampeta, Vivek Gupta, Shuo Zhang

Abstract: Recent methods based on pre-trained language models have exhibited superior performance over tabular tasks (e.g., tabular NLI), despite showing inherent problems such as not using the right evidence and inconsistent predictions across inputs while reasoning over the tabular data. In this work, we utilize Pattern-Exploiting Training (PET) (i.e., strategic MLM) on pre-trained language models to stre… ▽ More Recent methods based on pre-trained language models have exhibited superior performance over tabular tasks (e.g., tabular NLI), despite showing inherent problems such as not using the right evidence and inconsistent predictions across inputs while reasoning over the tabular data. In this work, we utilize Pattern-Exploiting Training (PET) (i.e., strategic MLM) on pre-trained language models to strengthen these tabular reasoning models' pre-existing knowledge and reasoning abilities. Our upgraded model exhibits a superior understanding of knowledge facts and tabular reasoning compared to current baselines. Additionally, we demonstrate that such models are more effective for underlying downstream tasks of tabular inference on InfoTabs. Furthermore, we show our model's robustness against adversarial sets generated through various character and word level perturbations. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing

arXiv:2209.13454 [pdf]

Artificial Intelligence for Cybersecurity: Threats, Attacks and Mitigation

Authors: Abhilash Chakraborty, Anupam Biswas, Ajoy Kumar Khan

Abstract: With the advent of the digital era, every day-to-day task is automated due to technological advances. However, technology has yet to provide people with enough tools and safeguards. As the internet connects more-and-more devices around the globe, the question of securing the connected devices grows at an even spiral rate. Data thefts, identity thefts, fraudulent transactions, password compromises,… ▽ More With the advent of the digital era, every day-to-day task is automated due to technological advances. However, technology has yet to provide people with enough tools and safeguards. As the internet connects more-and-more devices around the globe, the question of securing the connected devices grows at an even spiral rate. Data thefts, identity thefts, fraudulent transactions, password compromises, and system breaches are becoming regular everyday news. The surging menace of cyber-attacks got a jolt from the recent advancements in Artificial Intelligence. AI is being applied in almost every field of different sciences and engineering. The intervention of AI not only automates a particular task but also improves efficiency by many folds. So it is evident that such a scrumptious spread would be very appetizing to cybercriminals. Thus the conventional cyber threats and attacks are now ``intelligent" threats. This article discusses cybersecurity and cyber threats along with both conventional and intelligent ways of defense against cyber-attacks. Furthermore finally, end the discussion with the potential prospects of the future of AI in cybersecurity. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: Submitted to Springer

MSC Class: 68T99 ACM Class: E.3; I.2

arXiv:2209.08172 [pdf, other]

doi 10.1007/978-3-031-37742-6_47

Weakly Supervised Medical Image Segmentation With Soft Labels and Noise Robust Loss

Authors: Banafshe Felfeliyan, Abhilash Hareendranathan, Gregor Kuntze, Stephanie Wichuk, Nils D. Forkert, Jacob L. Jaremko, Janet L. Ronsky

Abstract: Recent advances in deep learning algorithms have led to significant benefits for solving many medical image analysis problems. Training deep learning models commonly requires large datasets with expert-labeled annotations. However, acquiring expert-labeled annotation is not only expensive but also is subjective, error-prone, and inter-/intra- observer variability introduces noise to labels. This i… ▽ More Recent advances in deep learning algorithms have led to significant benefits for solving many medical image analysis problems. Training deep learning models commonly requires large datasets with expert-labeled annotations. However, acquiring expert-labeled annotation is not only expensive but also is subjective, error-prone, and inter-/intra- observer variability introduces noise to labels. This is particularly a problem when using deep learning models for segmenting medical images due to the ambiguous anatomical boundaries. Image-based medical diagnosis tools using deep learning models trained with incorrect segmentation labels can lead to false diagnoses and treatment suggestions. Multi-rater annotations might be better suited to train deep learning models with small training sets compared to single-rater annotations. The aim of this paper was to develop and evaluate a method to generate probabilistic labels based on multi-rater annotations and anatomical knowledge of the lesion features in MRI and a method to train segmentation models using probabilistic labels using normalized active-passive loss as a "noise-tolerant loss" function. The model was evaluated by comparing it to binary ground truth for 17 knees MRI scans for clinical segmentation and detection of bone marrow lesions (BML). The proposed method successfully improved precision 14, recall 22, and Dice score 8 percent compared to a binary cross-entropy loss function. Overall, the results of this work suggest that the proposed normalized active-passive loss using soft labels successfully mitigated the effects of noisy labels. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2209.06694 [pdf, other]

doi 10.1145/3551349.3561152

Cornucopia: A Framework for Feedback Guided Generation of Binaries

Authors: Vidush Singhal, Akul Abhilash Pillai, Charitha Saumya, Milind Kulkarni, Aravind Machiry

Abstract: Binary analysis is an important capability required for many security and software engineering applications. Consequently, there are many binary analysis techniques and tools with varied capabilities. However, testing these tools requires a large, varied binary dataset with corresponding source-level information. In this paper, we present Cornucopia, an architecture agnostic automated framework th… ▽ More Binary analysis is an important capability required for many security and software engineering applications. Consequently, there are many binary analysis techniques and tools with varied capabilities. However, testing these tools requires a large, varied binary dataset with corresponding source-level information. In this paper, we present Cornucopia, an architecture agnostic automated framework that can generate a plethora of binaries from corresponding program source by exploiting compiler optimizations and feedback-guided learning. Our evaluation shows that Cornucopia was able to generate 309K binaries across four architectures (x86, x64, ARM, MIPS) with an average of 403 binaries for each program and outperforms Bintuner, a similar technique. Our experiments revealed issues with the LLVM optimization scheduler resulting in compiler crashes ($\sim$300). Our evaluation of four popular binary analysis tools Angr, Ghidra, Idapro, and Radare, using Cornucopia generated binaries, revealed various issues with these tools. Specifically, we found 263 crashes in Angr and one memory corruption issue in Idapro. Our differential testing on the analysis results revealed various semantic bugs in these tools. We also tested machine learning tools, Asmvec, Safe, and Debin, that claim to capture binary semantics and show that they perform poorly (For instance, Debin F1 score dropped to 12.9% from reported 63.1%) on Cornucopia generated binaries. In summary, our exhaustive evaluation shows that Cornucopia is an effective mechanism to generate binaries for testing binary analysis techniques effectively. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: This paper has been accepted at the ASE'22 conference. [37th IEEE/ACM International Conference Automated Software Engineering 2022 (ASE)]

arXiv:2209.05911 [pdf, ps, other]

Computer vision based vehicle tracking as a complementary and scalable approach to RFID tagging

Authors: Pranav Kant Gaur, Abhilash Bhardwaj, Pritam Shete, Mohini Laghate, Dinesh M Sarode

Abstract: Logging of incoming/outgoing vehicles serves as a piece of critical information for root-cause analysis to combat security breach incidents in various sensitive organizations. RFID tagging hampers the scalability of vehicle tracking solutions on both logistics as well as technical fronts. For instance, requiring each incoming vehicle(departmental or private) to be RFID tagged is a severe constrain… ▽ More Logging of incoming/outgoing vehicles serves as a piece of critical information for root-cause analysis to combat security breach incidents in various sensitive organizations. RFID tagging hampers the scalability of vehicle tracking solutions on both logistics as well as technical fronts. For instance, requiring each incoming vehicle(departmental or private) to be RFID tagged is a severe constraint and coupling video analytics with RFID to detect abnormal vehicle movement is non-trivial. We leverage publicly available implementations of computer vision algorithms to develop an interpretable vehicle tracking algorithm using finite-state machine formalism. The state-machine consumes input from the cascaded object detection and optical character recognition(OCR) models for state transitions. We evaluated the proposed method on 75 video clips of 285 vehicles from our system deployment site. We observed that the detection rate is most affected by the speed and the type of vehicle. The highest detection rate is achieved when the vehicle movement is restricted to follow a movement restrictions(SOP) at the checkpoint similar to RFID tagging. We further analyzed 700 vehicle tracking predictions on live-data and identified that the majority of vehicle number prediction errors are due to illegible-text, image-blur, text occlusion and out-of-vocab letters in vehicle numbers. Towards system deployment and performance enhancement, we expect our ongoing system monitoring to provide evidences to establish a higher vehicle-throughput SOP at the security checkpoint as well as to drive the fine-tuning of the deployed computer-vision models and the state-machine to establish the proposed approach as a promising alternative to RFID-tagging. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Showing 1–50 of 101 results for author: Abhilash