Search | arXiv e-print repository

arXiv:2509.19925 [pdf, ps, other]

CON-QA: Privacy-Preserving QA using cloud LLMs in Contract Domain

Authors: Ajeet Kumar Singh, Rajsabi Surya, Anurag Tripathi, Santanu Choudhury, Sudhir Bisane

Abstract: As enterprises increasingly integrate cloud-based large language models (LLMs) such as ChatGPT and Gemini into their legal document workflows, protecting sensitive contractual information - including Personally Identifiable Information (PII) and commercially sensitive clauses - has emerged as a critical challenge. In this work, we propose CON-QA, a hybrid privacy-preserving framework designed spec… ▽ More As enterprises increasingly integrate cloud-based large language models (LLMs) such as ChatGPT and Gemini into their legal document workflows, protecting sensitive contractual information - including Personally Identifiable Information (PII) and commercially sensitive clauses - has emerged as a critical challenge. In this work, we propose CON-QA, a hybrid privacy-preserving framework designed specifically for secure question answering over enterprise contracts, effectively combining local and cloud-hosted LLMs. The CON-QA framework operates through three stages: (i) semantic query decomposition and query-aware document chunk retrieval using a locally deployed LLM analysis, (ii) anonymization of detected sensitive entities via a structured one-to-many mapping scheme, ensuring semantic coherence while preventing cross-session entity inference attacks, and (iii) anonymized response generation by a cloud-based LLM, with accurate reconstruction of the original answer locally using a session-consistent many-to-one reverse mapping. To rigorously evaluate CON-QA, we introduce CUAD-QA, a corpus of 85k question-answer pairs generated over 510 real-world CUAD contract documents, encompassing simple, complex, and summarization-style queries. Empirical evaluations, complemented by detailed human assessments, confirm that CON-QA effectively maintains both privacy and utility, preserves answer quality, maintains fidelity to legal clause semantics, and significantly mitigates privacy risks, demonstrating its practical suitability for secure, enterprise-level contract documents. △ Less

Submitted 24 September, 2025; originally announced September 2025.

arXiv:2508.14946 [pdf, ps, other]

HHNAS-AM: Hierarchical Hybrid Neural Architecture Search using Adaptive Mutation Policies

Authors: Anurag Tripathi, Ajeet Kumar Singh, Rajsabi Surya, Aum Gupta, Sahiinii Lemaina Veikho, Dorien Herremans, Sudhir Bisane

Abstract: Neural Architecture Search (NAS) has garnered significant research interest due to its capability to discover architectures superior to manually designed ones. Learning text representation is crucial for text classification and other language-related tasks. The NAS model used in text classification does not have a Hybrid hierarchical structure, and there is no restriction on the architecture struc… ▽ More Neural Architecture Search (NAS) has garnered significant research interest due to its capability to discover architectures superior to manually designed ones. Learning text representation is crucial for text classification and other language-related tasks. The NAS model used in text classification does not have a Hybrid hierarchical structure, and there is no restriction on the architecture structure, due to which the search space becomes very large and mostly redundant, so the existing RL models are not able to navigate the search space effectively. Also, doing a flat architecture search leads to an unorganised search space, which is difficult to traverse. For this purpose, we propose HHNAS-AM (Hierarchical Hybrid Neural Architecture Search with Adaptive Mutation Policies), a novel approach that efficiently explores diverse architectural configurations. We introduce a few architectural templates to search on which organise the search spaces, where search spaces are designed on the basis of domain-specific cues. Our method employs mutation strategies that dynamically adapt based on performance feedback from previous iterations using Q-learning, enabling a more effective and accelerated traversal of the search space. The proposed model is fully probabilistic, enabling effective exploration of the search space. We evaluate our approach on the database id (db_id) prediction task, where it consistently discovers high-performing architectures across multiple experiments. On the Spider dataset, our method achieves an 8% improvement in test accuracy over existing baselines. △ Less

Submitted 20 August, 2025; originally announced August 2025.

arXiv:2508.07387 [pdf, ps, other]

MonoMPC: Monocular Vision Based Navigation with Learned Collision Model and Risk-Aware Model Predictive Control

Authors: Basant Sharma, Prajyot Jadhav, Pranjal Paul, K. Madhava Krishna, Arun Kumar Singh

Abstract: Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments. We propose an alternative approach: instead of using noisy estim… ▽ More Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments. We propose an alternative approach: instead of using noisy estimated depth for direct collision-checking, we use it as a rich context input to a learned collision model. This model predicts the distribution of minimum obstacle clearance that the robot can expect for a given control sequence. At inference, these predictions inform a risk-aware MPC planner that minimizes estimated collision risk. Our joint learning pipeline co-trains the collision model and risk metric using both safe and unsafe trajectories. Crucially, our joint-training ensures optimal variance in our collision model that improves navigation in highly cluttered environments. Consequently, real-world experiments show 9x and 7x improvements in success rates over NoMaD and the ROS stack, respectively. Ablation studies further validate the effectiveness of our design choices. △ Less

Submitted 10 August, 2025; originally announced August 2025.

arXiv:2508.06387 [pdf, ps, other]

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

Authors: Anurag Tripathi, Vaibhav Patle, Abhinav Jain, Ayush Pundir, Sairam Menon, Ajeet Kumar Singh, Dorien Herremans

Abstract: Text-to-SQL bridges the gap between natural language and structured database language, thus allowing non-technical users to easily query databases. Traditional approaches model text-to-SQL as a direct translation task, where a given Natural Language Query (NLQ) is mapped to an SQL command. Recent advances in large language models (LLMs) have significantly improved translation accuracy, however, th… ▽ More Text-to-SQL bridges the gap between natural language and structured database language, thus allowing non-technical users to easily query databases. Traditional approaches model text-to-SQL as a direct translation task, where a given Natural Language Query (NLQ) is mapped to an SQL command. Recent advances in large language models (LLMs) have significantly improved translation accuracy, however, these methods all require that the target database is pre-specified. This becomes problematic in scenarios with multiple extensive databases, where identifying the correct database becomes a crucial yet overlooked step. In this paper, we propose a three-stage end-to-end text-to-SQL framework to identify the user's intended database before generating SQL queries. Our approach leverages LLMs and prompt engineering to extract implicit information from natural language queries (NLQs) in the form of a ruleset. We then train a large db\_id prediction model, which includes a RoBERTa-based finetuned encoder, to predict the correct Database identifier (db\_id) based on both the NLQ and the LLM-generated rules. Finally, we refine the generated SQL by using critic agents to correct errors. Experimental results demonstrate that our framework outperforms the current state-of-the-art models in both database intent prediction and SQL generation accuracy. △ Less

Submitted 11 August, 2025; v1 submitted 8 August, 2025; originally announced August 2025.

Comments: Accepted in IJCNN25

arXiv:2508.05661 [pdf, ps, other]

Zero-Shot Retrieval for Scalable Visual Search in a Two-Sided Marketplace

Authors: Andre Rusli, Shoma Ishimoto, Sho Akiyama, Aman Kumar Singh

Abstract: Visual search offers an intuitive way for customers to explore diverse product catalogs, particularly in consumer-to-consumer (C2C) marketplaces where listings are often unstructured and visually driven. This paper presents a scalable visual search system deployed in Mercari's C2C marketplace, where end-users act as buyers and sellers. We evaluate recent vision-language models for zero-shot image… ▽ More Visual search offers an intuitive way for customers to explore diverse product catalogs, particularly in consumer-to-consumer (C2C) marketplaces where listings are often unstructured and visually driven. This paper presents a scalable visual search system deployed in Mercari's C2C marketplace, where end-users act as buyers and sellers. We evaluate recent vision-language models for zero-shot image retrieval and compare their performance with an existing fine-tuned baseline. The system integrates real-time inference and background indexing workflows, supported by a unified embedding pipeline optimized through dimensionality reduction. Offline evaluation using user interaction logs shows that the multilingual SigLIP model outperforms other models across multiple retrieval metrics, achieving a 13.3% increase in nDCG@5 over the baseline. A one-week online A/B test in production further confirms real-world impact, with the treatment group showing substantial gains in engagement and conversion, up to a 40.9% increase in transaction rate via image search. Our findings highlight that recent zero-shot models can serve as a strong and practical baseline for production use, which enables teams to deploy effective visual search systems with minimal overhead, while retaining the flexibility to fine-tune based on future data or domain-specific needs. △ Less

Submitted 31 July, 2025; originally announced August 2025.

Comments: 6 pages, KDD 2025 Workshop on Two-sided Marketplace Optimization: Search, Pricing, Matching & Growth (TSMO)

arXiv:2507.18763 [pdf, ps, other]

Diffusion-FS: Multimodal Free-Space Prediction via Diffusion for Autonomous Driving

Authors: Keshav Gupta, Tejas S. Stanley, Pranjal Paul, Arun K. Singh, K. Madhava Krishna

Abstract: Drivable Free-space prediction is a fundamental and crucial problem in autonomous driving. Recent works have addressed the problem by representing the entire non-obstacle road regions as the free-space. In contrast our aim is to estimate the driving corridors that are a navigable subset of the entire road region. Unfortunately, existing corridor estimation methods directly assume a BEV-centric rep… ▽ More Drivable Free-space prediction is a fundamental and crucial problem in autonomous driving. Recent works have addressed the problem by representing the entire non-obstacle road regions as the free-space. In contrast our aim is to estimate the driving corridors that are a navigable subset of the entire road region. Unfortunately, existing corridor estimation methods directly assume a BEV-centric representation, which is hard to obtain. In contrast, we frame drivable free-space corridor prediction as a pure image perception task, using only monocular camera input. However such a formulation poses several challenges as one doesn't have the corresponding data for such free-space corridor segments in the image. Consequently, we develop a novel self-supervised approach for free-space sample generation by leveraging future ego trajectories and front-view camera images, making the process of visual corridor estimation dependent on the ego trajectory. We then employ a diffusion process to model the distribution of such segments in the image. However, the existing binary mask-based representation for a segment poses many limitations. Therefore, we introduce ContourDiff, a specialized diffusion-based architecture that denoises over contour points rather than relying on binary mask representations, enabling structured and interpretable free-space predictions. We evaluate our approach qualitatively and quantitatively on both nuScenes and CARLA, demonstrating its effectiveness in accurately predicting safe multimodal navigable corridors in the image. △ Less

Submitted 24 July, 2025; originally announced July 2025.

Comments: 8 pages, 7 figures, IROS 2025

arXiv:2507.17654 [pdf, ps, other]

On Function-Correcting Codes in the Lee Metric

Authors: Gyanendra K. Verma, Abhay Kumar Singh

Abstract: Function-correcting codes are a coding framework designed to minimize redundancy while ensuring that specific functions or computations of encoded data can be reliably recovered, even in the presence of errors. The choice of metric is crucial in designing such codes, as it determines which computations must be protected and how errors are measured and corrected. Previous work by Liu and Liu [6] st… ▽ More Function-correcting codes are a coding framework designed to minimize redundancy while ensuring that specific functions or computations of encoded data can be reliably recovered, even in the presence of errors. The choice of metric is crucial in designing such codes, as it determines which computations must be protected and how errors are measured and corrected. Previous work by Liu and Liu [6] studied function-correcting codes over $\mathbb{Z}_{2^l},\ l\geq 2$ using the homogeneous metric, which coincides with the Lee metric over $\mathbb{Z}_4$. In this paper, we extend the study to codes over $\mathbb{Z}_m,$ for any positive integer $m\geq 2$ under the Lee metric and aim to determine their optimal redundancy. To achieve this, we introduce irregular Lee distance codes and derive upper and lower bounds on the optimal redundancy by characterizing the shortest possible length of such codes. These general bounds are then simplified and applied to specific classes of functions, including Lee-local functions, Lee weight functions, and Lee weight distribution functions, leading to improved some bounds compared to those of Liu and Liu [6] over $\mathbb{Z}_4$ and generalize the other bounds over $\mathbb{Z}_m$ in the Lee metric. △ Less

Submitted 23 July, 2025; originally announced July 2025.

arXiv:2507.08317 [pdf, ps, other]

doi 10.1109/TNNLS.2025.3577721

A Comprehensively Adaptive Architectural Optimization-Ingrained Quantum Neural Network Model for Cloud Workloads Prediction

Authors: Jitendra Kumar, Deepika Saxena, Kishu Gupta, Satyam Kumar, Ashutosh Kumar Singh

Abstract: Accurate workload prediction and advanced resource reservation are indispensably crucial for managing dynamic cloud services. Traditional neural networks and deep learning models frequently encounter challenges with diverse, high-dimensional workloads, especially during sudden resource demand changes, leading to inefficiencies. This issue arises from their limited optimization during training, rel… ▽ More Accurate workload prediction and advanced resource reservation are indispensably crucial for managing dynamic cloud services. Traditional neural networks and deep learning models frequently encounter challenges with diverse, high-dimensional workloads, especially during sudden resource demand changes, leading to inefficiencies. This issue arises from their limited optimization during training, relying only on parametric (inter-connection weights) adjustments using conventional algorithms. To address this issue, this work proposes a novel Comprehensively Adaptive Architectural Optimization-based Variable Quantum Neural Network (CA-QNN), which combines the efficiency of quantum computing with complete structural and qubit vector parametric learning. The model converts workload data into qubits, processed through qubit neurons with Controlled NOT-gated activation functions for intuitive pattern recognition. In addition, a comprehensive architecture optimization algorithm for networks is introduced to facilitate the learning and propagation of the structure and parametric values in variable-sized QNNs. This algorithm incorporates quantum adaptive modulation and size-adaptive recombination during training process. The performance of CA-QNN model is thoroughly investigated against seven state-of-the-art methods across four benchmark datasets of heterogeneous cloud workloads. The proposed model demonstrates superior prediction accuracy, reducing prediction errors by up to 93.40% and 91.27% compared to existing deep learning and QNN-based approaches. △ Less

Submitted 11 July, 2025; originally announced July 2025.

arXiv:2506.13935 [pdf, ps, other]

ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture

Authors: Vishesh Kumar Tanwar, Soumik Sarkar, Asheesh K. Singh, Sajal K. Das

Abstract: To empower precision agriculture through distributed machine learning (DML), split learning (SL) has emerged as a promising paradigm, partitioning deep neural networks (DNNs) between edge devices and servers to reduce computational burdens and preserve data privacy. However, conventional SL frameworks' one-split-fits-all strategy is a critical limitation in agricultural ecosystems where edge insec… ▽ More To empower precision agriculture through distributed machine learning (DML), split learning (SL) has emerged as a promising paradigm, partitioning deep neural networks (DNNs) between edge devices and servers to reduce computational burdens and preserve data privacy. However, conventional SL frameworks' one-split-fits-all strategy is a critical limitation in agricultural ecosystems where edge insect monitoring devices exhibit vast heterogeneity in computational power, energy constraints, and connectivity. This leads to straggler bottlenecks, inefficient resource utilization, and compromised model performance. Bridging this gap, we introduce ReinDSplit, a novel reinforcement learning (RL)-driven framework that dynamically tailors DNN split points for each device, optimizing efficiency without sacrificing accuracy. Specifically, a Q-learning agent acts as an adaptive orchestrator, balancing workloads and latency thresholds across devices to mitigate computational starvation or overload. By framing split layer selection as a finite-state Markov decision process, ReinDSplit convergence ensures that highly constrained devices contribute meaningfully to model training over time. Evaluated on three insect classification datasets using ResNet18, GoogleNet, and MobileNetV2, ReinDSplit achieves 94.31% accuracy with MobileNetV2. Beyond agriculture, ReinDSplit pioneers a paradigm shift in SL by harmonizing RL for resource efficiency, privacy, and scalability in heterogeneous environments. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13253 [pdf, ps, other]

Distinct Computations Emerge From Compositional Curricula in In-Context Learning

Authors: Jin Hwa Lee, Andrew K. Lampinen, Aaditya K. Singh, Andrew M. Saxe

Abstract: In-context learning (ICL) research often considers learning a function in-context through a uniform sample of input-output pairs. Here, we investigate how presenting a compositional subtask curriculum in context may alter the computations a transformer learns. We design a compositional algorithmic task based on the modular exponential-a double exponential task composed of two single exponential su… ▽ More In-context learning (ICL) research often considers learning a function in-context through a uniform sample of input-output pairs. Here, we investigate how presenting a compositional subtask curriculum in context may alter the computations a transformer learns. We design a compositional algorithmic task based on the modular exponential-a double exponential task composed of two single exponential subtasks and train transformer models to learn the task in-context. We compare (a) models trained using an in-context curriculum consisting of single exponential subtasks and, (b) models trained directly on the double exponential task without such a curriculum. We show that models trained with a subtask curriculum can perform zero-shot inference on unseen compositional tasks and are more robust given the same context length. We study how the task and subtasks are represented across the two training regimes. We find that the models employ diverse strategies modulated by the specific curriculum design. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.06023 [pdf, ps, other]

Restereo: Diffusion stereo video generation and restoration

Authors: Xingchang Huang, Ashish Kumar Singh, Florian Dubost, Cristina Nader Vasconcelos, Sakar Khattar, Liang Shi, Christian Theobalt, Cengiz Oztireli, Gurprit Singh

Abstract: Stereo video generation has been gaining increasing attention with recent advancements in video diffusion models. However, most existing methods focus on generating 3D stereoscopic videos from monocular 2D videos. These approaches typically assume that the input monocular video is of high quality, making the task primarily about inpainting occluded regions in the warped video while preserving diso… ▽ More Stereo video generation has been gaining increasing attention with recent advancements in video diffusion models. However, most existing methods focus on generating 3D stereoscopic videos from monocular 2D videos. These approaches typically assume that the input monocular video is of high quality, making the task primarily about inpainting occluded regions in the warped video while preserving disoccluded areas. In this paper, we introduce a new pipeline that not only generates stereo videos but also enhances both left-view and right-view videos consistently with a single model. Our approach achieves this by fine-tuning the model on degraded data for restoration, as well as conditioning the model on warped masks for consistent stereo generation. As a result, our method can be fine-tuned on a relatively small synthetic stereo video datasets and applied to low-quality real-world videos, performing both stereo video generation and restoration. Experiments demonstrate that our method outperforms existing approaches both qualitatively and quantitatively in stereo video generation from low-resolution inputs. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: 12 pages, 5 figures

arXiv:2506.03182 [pdf, ps, other]

TerraIncognita: A Dynamic Benchmark for Species Discovery Using Frontier Models

Authors: Shivani Chiranjeevi, Hossein Zaremehrjerdi, Zi K. Deng, Talukder Z. Jubery, Ari Grele, Arti Singh, Asheesh K Singh, Soumik Sarkar, Nirav Merchant, Harold F. Greeney, Baskar Ganapathysubramanian, Chinmay Hegde

Abstract: The rapid global loss of biodiversity, particularly among insects, represents an urgent ecological crisis. Current methods for insect species discovery are manual, slow, and severely constrained by taxonomic expertise, hindering timely conservation actions. We introduce TerraIncognita, a dynamic benchmark designed to evaluate state-of-the-art multimodal models for the challenging problem of identi… ▽ More The rapid global loss of biodiversity, particularly among insects, represents an urgent ecological crisis. Current methods for insect species discovery are manual, slow, and severely constrained by taxonomic expertise, hindering timely conservation actions. We introduce TerraIncognita, a dynamic benchmark designed to evaluate state-of-the-art multimodal models for the challenging problem of identifying unknown, potentially undescribed insect species from image data. Our benchmark dataset combines a mix of expertly annotated images of insect species likely known to frontier AI models, and images of rare and poorly known species, for which few/no publicly available images exist. These images were collected from underexplored biodiversity hotspots, realistically mimicking open-world discovery scenarios faced by ecologists. The benchmark assesses models' proficiency in hierarchical taxonomic classification, their capability to detect and abstain from out-of-distribution (OOD) samples representing novel species, and their ability to generate explanations aligned with expert taxonomic knowledge. Notably, top-performing models achieve over 90\% F1 at the Order level on known species, but drop below 2\% at the Species level, highlighting the sharp difficulty gradient from coarse to fine taxonomic prediction (Order $\rightarrow$ Family $\rightarrow$ Genus $\rightarrow$ Species). TerraIncognita will be updated regularly, and by committing to quarterly dataset expansions (of both known and novel species), will provide an evolving platform for longitudinal benchmarking of frontier AI methods. All TerraIncognita data, results, and future updates are available \href{https://baskargroup.github.io/TerraIncognita/}{here}. △ Less

Submitted 29 May, 2025; originally announced June 2025.

arXiv:2506.00450 [pdf, ps, other]

DV365: Extremely Long User History Modeling at Instagram

Authors: Wenhan Lyu, Devashish Tyagi, Yihang Yang, Ziwei Li, Ajay Somani, Karthikeyan Shanmugasundaram, Nikola Andrejevic, Ferdi Adeputra, Curtis Zeng, Arun K. Singh, Maxime Ransan, Sagar Jain

Abstract: Long user history is highly valuable signal for recommendation systems, but effectively incorporating it often comes with high cost in terms of data center power consumption and GPU. In this work, we chose offline embedding over end-to-end sequence length optimization methods to enable extremely long user sequence modeling as a cost-effective solution, and propose a new user embedding learning str… ▽ More Long user history is highly valuable signal for recommendation systems, but effectively incorporating it often comes with high cost in terms of data center power consumption and GPU. In this work, we chose offline embedding over end-to-end sequence length optimization methods to enable extremely long user sequence modeling as a cost-effective solution, and propose a new user embedding learning strategy, multi-slicing and summarization, that generates highly generalizable user representation of user's long-term stable interest. History length we encoded in this embedding is up to 70,000 and on average 40,000. This embedding, named as DV365, is proven highly incremental on top of advanced attentive user sequence models deployed in Instagram. Produced by a single upstream foundational model, it is launched in 15 different models across Instagram and Threads with significant impact, and has been production battle-proven for >1 year since our first launch. △ Less

Submitted 31 May, 2025; originally announced June 2025.

Comments: SIGKDD 2025 accepted

arXiv:2505.19259 [pdf, ps, other]

Towards Large Reasoning Models for Agriculture

Authors: Hossein Zaremehrjerdi, Shreyan Ganguly, Ashlyn Rairdin, Elizabeth Tranel, Benjamin Feuer, Juan Ignacio Di Salvo, Srikanth Panthulugiri, Hernan Torres Pacin, Victoria Moser, Sarah Jones, Joscif G Raigne, Yanben Shen, Heidi M. Dornath, Aditya Balu, Adarsh Krishnamurthy, Asheesh K Singh, Arti Singh, Baskar Ganapathysubramanian, Chinmay Hegde, Soumik Sarkar

Abstract: Agricultural decision-making involves complex, context-specific reasoning, where choices about crops, practices, and interventions depend heavily on geographic, climatic, and economic conditions. Traditional large language models (LLMs) often fall short in navigating this nuanced problem due to limited reasoning capacity. We hypothesize that recent advances in large reasoning models (LRMs) can bet… ▽ More Agricultural decision-making involves complex, context-specific reasoning, where choices about crops, practices, and interventions depend heavily on geographic, climatic, and economic conditions. Traditional large language models (LLMs) often fall short in navigating this nuanced problem due to limited reasoning capacity. We hypothesize that recent advances in large reasoning models (LRMs) can better handle such structured, domain-specific inference. To investigate this, we introduce AgReason, the first expert-curated open-ended science benchmark with 100 questions for agricultural reasoning. Evaluations across thirteen open-source and proprietary models reveal that LRMs outperform conventional ones, though notable challenges persist, with the strongest Gemini-based baseline achieving 36% accuracy. We also present AgThoughts, a large-scale dataset of 44.6K question-answer pairs generated with human oversight and equipped with synthetically generated reasoning traces. Using AgThoughts, we develop AgThinker, a suite of small reasoning models that can be run on consumer-grade GPUs, and show that our dataset can be effective in unlocking agricultural reasoning abilities in LLMs. Our project page is here: https://baskargroup.github.io/Ag_reasoning/ △ Less

Submitted 27 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

arXiv:2505.18930 [pdf, other]

WeedNet: A Foundation Model-Based Global-to-Local AI Approach for Real-Time Weed Species Identification and Classification

Authors: Yanben Shen, Timilehin T. Ayanlade, Venkata Naresh Boddepalli, Mojdeh Saadati, Ashlyn Rairdin, Zi K. Deng, Muhammad Arbab Arshad, Aditya Balu, Daren Mueller, Asheesh K Singh, Wesley Everman, Nirav Merchant, Baskar Ganapathysubramanian, Meaghan Anderson, Soumik Sarkar, Arti Singh

Abstract: Early identification of weeds is essential for effective management and control, and there is growing interest in automating the process using computer vision techniques coupled with AI methods. However, challenges associated with training AI-based weed identification models, such as limited expert-verified data and complexity and variability in morphological features, have hindered progress. To a… ▽ More Early identification of weeds is essential for effective management and control, and there is growing interest in automating the process using computer vision techniques coupled with AI methods. However, challenges associated with training AI-based weed identification models, such as limited expert-verified data and complexity and variability in morphological features, have hindered progress. To address these issues, we present WeedNet, the first global-scale weed identification model capable of recognizing an extensive set of weed species, including noxious and invasive plant species. WeedNet is an end-to-end real-time weed identification pipeline and uses self-supervised learning, fine-tuning, and enhanced trustworthiness strategies. WeedNet achieved 91.02% accuracy across 1,593 weed species, with 41% species achieving 100% accuracy. Using a fine-tuning strategy and a Global-to-Local approach, the local Iowa WeedNet model achieved an overall accuracy of 97.38% for 85 Iowa weeds, most classes exceeded a 90% mean accuracy per class. Testing across intra-species dissimilarity (developmental stages) and inter-species similarity (look-alike species) suggests that diversity in the images collected, spanning all the growth stages and distinguishable plant characteristics, is crucial in driving model performance. The generalizability and adaptability of the Global WeedNet model enable it to function as a foundational model, with the Global-to-Local strategy allowing fine-tuning for region-specific weed communities. Additional validation of drone- and ground-rover-based images highlights the potential of WeedNet for integration into robotic platforms. Furthermore, integration with AI for conversational use provides intelligent agricultural and ecological conservation consulting tools for farmers, agronomists, researchers, land managers, and government agencies across diverse landscapes. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.14751 [pdf, ps, other]

Self Distillation via Iterative Constructive Perturbations

Authors: Maheak Dave, Aniket Kumar Singh, Aryan Pareek, Harshita Jha, Debasis Chaudhuri, Manish Pratap Singh

Abstract: Deep Neural Networks have achieved remarkable achievements across various domains, however balancing performance and generalization still remains a challenge while training these networks. In this paper, we propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training, rethinking the traditional training paradigm. Centr… ▽ More Deep Neural Networks have achieved remarkable achievements across various domains, however balancing performance and generalization still remains a challenge while training these networks. In this paper, we propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training, rethinking the traditional training paradigm. Central to our approach is Iterative Constructive Perturbation (ICP), which leverages the model's loss to iteratively perturb the input, progressively constructing an enhanced representation over some refinement steps. This ICP input is then fed back into the model to produce improved intermediate features, which serve as a target in a self-distillation framework against the original features. By alternately altering the model's parameters to the data and the data to the model, our method effectively addresses the gap between fitting and generalization, leading to enhanced performance. Extensive experiments demonstrate that our approach not only mitigates common performance bottlenecks in neural networks but also demonstrates significant improvements across training variations. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.09473 [pdf, ps, other]

Function-Correcting $b$-symbol Codes for Locally $(λ, ρ,b)$-Functions

Authors: Gyanendra K. Verma, Anamika Singh, Abhay Kumar Singh

Abstract: The family of functions plays a central role in the design and effectiveness of function-correcting codes. By focusing on a well-defined family of functions, function-correcting codes can be constructed with minimal length while still ensuring full error detection and correction within that family. In this work, we explore the concept of locally $(λ,ρ)$-functions for $b$-symbol read channels and i… ▽ More The family of functions plays a central role in the design and effectiveness of function-correcting codes. By focusing on a well-defined family of functions, function-correcting codes can be constructed with minimal length while still ensuring full error detection and correction within that family. In this work, we explore the concept of locally $(λ,ρ)$-functions for $b$-symbol read channels and investigate the optimal redundancy of the corresponding function-correcting $b$-symbol codes (FCBSC) by introducing the notions of locally $(λ,ρ,b)$-functions. First, we discuss the values of $λ$ and $ρ$ for which a function can be considered as a locally $(λ,ρ)$-function in $b$-symbol metric. The findings improve some known results in the Hamming metric and present several new results in the $b$-symbol metric. Then we investigate the optimal redundancy of $(f,t)$-FCBSCs for locally $(λ,ρ,b)$-functions. We establish a recurrence relation between the optimal redundancy of $(f,t)$-function-correcting codes for the $(b+1)$-symbol read and $b$-symbol read channels. We present an upper bound on the optimal redundancy of $(f,t)$-function-correcting $b$-symbol codes for general locally ($λ,ρ$, $b$)-functions by associating it to the minimum achievable length of $b$-symbol error-correcting codes and traditional Hamming-metric codes, given a fixed number of codewords and a specified minimum distance. We derive some explicit upper bounds on the redundancy of $(f,t)$-function-correcting $b$-symbol codes for locally $(λ,2t,b)$-functions. Moreover, for the case where $b=1$, we show that a locally ($3,2t,1$)-function achieves the optimal redundancy of $3t$. Additionally, we explicitly investigate the locality and optimal redundancy of FCBSCs for the $b$-symbol weight function and weight distribution function for $b\geq1$. △ Less

Submitted 29 September, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.01215 [pdf, other]

doi 10.1109/TII.2025.3540498

A Self-Healing and Fault-Tolerant Cloud-based Digital Twin Processing Management Model

Authors: Deepika Saxena, Ashutosh Kumar Singh

Abstract: Digital twins, integral to cloud platforms, bridge physical and virtual worlds, fostering collaboration among stakeholders in manufacturing and processing. However, the cloud platforms face challenges like service outages, vulnerabilities, and resource contention, hindering critical digital twin application development. The existing research works have limited focus on reliability and fault tolera… ▽ More Digital twins, integral to cloud platforms, bridge physical and virtual worlds, fostering collaboration among stakeholders in manufacturing and processing. However, the cloud platforms face challenges like service outages, vulnerabilities, and resource contention, hindering critical digital twin application development. The existing research works have limited focus on reliability and fault tolerance in digital twin processing. In this context, this paper proposed a novel Self-healing and Faulttolerant cloud-based Digital Twin processing Management (SF-DTM) model. It employs collaborative digital twin tasks resource requirement estimation unit which utilizes newly devised Federated learning with cosine Similarity integration (SimiFed). Further, SF-DTM incorporates a self-healing fault-tolerance strategy employing a frequent sequence fault-prone pattern analytics unit for deciding the most admissible VM allocation. The implementation and evaluation of SF-DTM model using real traces demonstrates its effectiveness and resilience, revealing improved availability, higher Mean Time Between Failure (MTBF), and lower Mean Time To Repair (MTTR) compared with non-SF-DTM approaches, enhancing collaborative DT application management. SF-DTM improved the services availability up to 13.2% over non-SF-DTM-based DT processing. △ Less

Submitted 2 May, 2025; originally announced May 2025.

Comments: 10 pages, 7 figures

Journal ref: IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 2025

arXiv:2504.14674 [pdf, ps, other]

Binary cyclic codes from permutation polynomials over $\mathbb{F}_{2^m}$

Authors: Mrinal Kanti Bose, Udaya Parampalli, Abhay Kumar Singh

Abstract: Binary cyclic codes having large dimensions and minimum distances close to the square-root bound are highly valuable in applications where high-rate transmission and robust error correction are both essential. They provide an optimal trade-off between these two factors, making them suitable for demanding communication and storage systems, post-quantum cryptography, radar and sonar systems, wireles… ▽ More Binary cyclic codes having large dimensions and minimum distances close to the square-root bound are highly valuable in applications where high-rate transmission and robust error correction are both essential. They provide an optimal trade-off between these two factors, making them suitable for demanding communication and storage systems, post-quantum cryptography, radar and sonar systems, wireless sensor networks, and space communications. This paper aims to investigate cyclic codes by an efficient approach introduced by Ding \cite{SETA5} from several known classes of permutation monomials and trinomials over $\mathbb{F}_{2^m}$. We present several infinite families of binary cyclic codes of length $2^m-1$ with dimensions larger than $(2^m-1)/2$. By applying the Hartmann-Tzeng bound, some of the lower bounds on the minimum distances of these cyclic codes are relatively close to the square root bound. Moreover, we obtain a new infinite family of optimal binary cyclic codes with parameters $[2^m-1,2^m-2-3m,8]$, where $m\geq 5$ is odd, according to the sphere-packing bound. △ Less

Submitted 20 April, 2025; originally announced April 2025.

MSC Class: 94B15; 11T71; 11T06 ACM Class: B.4.1; H.1.1

arXiv:2504.10962 [pdf, ps, other]

doi 10.1109/LRA.2025.3568323

$π$-MPPI: A Projection-based Model Predictive Path Integral Scheme for Smooth Optimal Control of Fixed-Wing Aerial Vehicles

Authors: Edvin Martin Andrejev, Amith Manoharan, Karl-Eerik Unt, Arun Kumar Singh

Abstract: Model Predictive Path Integral (MPPI) is a popular sampling-based Model Predictive Control (MPC) algorithm for nonlinear systems. It optimizes trajectories by sampling control sequences and averaging them. However, a key issue with MPPI is the non-smoothness of the optimal control sequence, leading to oscillations in systems like fixed-wing aerial vehicles (FWVs). Existing solutions use post-hoc s… ▽ More Model Predictive Path Integral (MPPI) is a popular sampling-based Model Predictive Control (MPC) algorithm for nonlinear systems. It optimizes trajectories by sampling control sequences and averaging them. However, a key issue with MPPI is the non-smoothness of the optimal control sequence, leading to oscillations in systems like fixed-wing aerial vehicles (FWVs). Existing solutions use post-hoc smoothing, which fails to bound control derivatives. This paper introduces a new approach: we add a projection filter $π$ to minimally correct control samples, ensuring bounds on control magnitude and higher-order derivatives. The filtered samples are then averaged using MPPI, leading to our $π$-MPPI approach. We minimize computational overhead by using a neural accelerated custom optimizer for the projection filter. $π$-MPPI offers a simple way to achieve arbitrary smoothness in control sequences. While we focus on FWVs, this projection filter can be integrated into any MPPI pipeline. Applied to FWVs, $π$-MPPI is easier to tune than the baseline, resulting in smoother, more robust performance. △ Less

Submitted 16 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

Comments: 8 pages, 4 figures, submitted to IEEE RA-L

Journal ref: IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 10, NO. 6, JUNE 2025

arXiv:2504.10088 [pdf, ps, other]

Code size constraints in b-symbol read channels: A bound analysis

Authors: Gyanendra K. Verma, Nupur Patanker, Abhay Kumar Singh

Abstract: In classical coding theory, error-correcting codes are designed to protect against errors occurring at individual symbol positions in a codeword. However, in practical storage and communication systems, errors often affect multiple adjacent symbols rather than single symbols independently. To address this, symbol-pair read channels were introduced \cite{Yuval2011}, and later generalized to $b$-sym… ▽ More In classical coding theory, error-correcting codes are designed to protect against errors occurring at individual symbol positions in a codeword. However, in practical storage and communication systems, errors often affect multiple adjacent symbols rather than single symbols independently. To address this, symbol-pair read channels were introduced \cite{Yuval2011}, and later generalized to $b$-symbol read channels \cite{yaakobi2016} to better model such error patterns. $b$-Symbol read channels generalize symbol-pair read channels to account for clustered errors in modern storage and communication systems. By developing bounds and efficient codes, researchers improve data reliability in applications such as storage devices, wireless networks, and DNA-based storage. Given integers $q$, $n$, $d$, and $b \geq 2$, let $A_b(n,d,q)$ denote the largest possible code size for which there exists a $q$-ary code of length $n$ with minimum $b$-symbol distance at least $d$. In \cite{chen2022}, various upper and lower bounds on $A_b(n,d,q)$ are given for $b=2$. In this paper, we generalize some of these bounds to the $b$-symbol read channels for $b>2$ and present several new bounds on $A_b(n,d,q)$. In particular, we establish the linear programming bound, a recurrence relation on $A_b(n,d,q)$, the Johnson bound (even), the restricted Johnson bound, the Gilbert-Varshamov-type bound, and the Elias bound for the metric of symbols $b$, $b\geq 2$. Furthermore, we provide examples demonstrating that the Gilbert-Varshamov bound we establish offers a stronger lower bound than the one presented in \cite{Song2018}. Additionally, we introduce an alternative approach to deriving the Sphere-packing and Plotkin bounds. △ Less

Submitted 22 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.05918 [pdf, other]

Deep RL-based Autonomous Navigation of Micro Aerial Vehicles (MAVs) in a complex GPS-denied Indoor Environment

Authors: Amit Kumar Singh, Prasanth Kumar Duba, P. Rajalakshmi

Abstract: The Autonomy of Unmanned Aerial Vehicles (UAVs) in indoor environments poses significant challenges due to the lack of reliable GPS signals in enclosed spaces such as warehouses, factories, and indoor facilities. Micro Aerial Vehicles (MAVs) are preferred for navigating in these complex, GPS-denied scenarios because of their agility, low power consumption, and limited computational capabilities. I… ▽ More The Autonomy of Unmanned Aerial Vehicles (UAVs) in indoor environments poses significant challenges due to the lack of reliable GPS signals in enclosed spaces such as warehouses, factories, and indoor facilities. Micro Aerial Vehicles (MAVs) are preferred for navigating in these complex, GPS-denied scenarios because of their agility, low power consumption, and limited computational capabilities. In this paper, we propose a Reinforcement Learning based Deep-Proximal Policy Optimization (D-PPO) algorithm to enhance realtime navigation through improving the computation efficiency. The end-to-end network is trained in 3D realistic meta-environments created using the Unreal Engine. With these trained meta-weights, the MAV system underwent extensive experimental trials in real-world indoor environments. The results indicate that the proposed method reduces computational latency by 91\% during training period without significant degradation in performance. The algorithm was tested on a DJI Tello drone, yielding similar results. △ Less

Submitted 8 April, 2025; originally announced April 2025.

arXiv:2504.03850 [pdf, other]

Detection Limits and Statistical Separability of Tree Ring Watermarks in Rectified Flow-based Text-to-Image Generation Models

Authors: Ved Umrajkar, Aakash Kumar Singh

Abstract: Tree-Ring Watermarking is a significant technique for authenticating AI-generated images. However, its effectiveness in rectified flow-based models remains unexplored, particularly given the inherent challenges of these models with noise latent inversion. Through extensive experimentation, we evaluated and compared the detection and separability of watermarks between SD 2.1 and FLUX.1-dev models.… ▽ More Tree-Ring Watermarking is a significant technique for authenticating AI-generated images. However, its effectiveness in rectified flow-based models remains unexplored, particularly given the inherent challenges of these models with noise latent inversion. Through extensive experimentation, we evaluated and compared the detection and separability of watermarks between SD 2.1 and FLUX.1-dev models. By analyzing various text guidance configurations and augmentation attacks, we demonstrate how inversion limitations affect both watermark recovery and the statistical separation between watermarked and unwatermarked images. Our findings provide valuable insights into the current limitations of Tree-Ring Watermarking in the current SOTA models and highlight the critical need for improved inversion methods to achieve reliable watermark detection and separability. The official implementation, dataset release and all experimental results are available at this \href{https://github.com/dsgiitr/flux-watermarking}{\textbf{link}}. △ Less

Submitted 4 April, 2025; originally announced April 2025.

arXiv:2503.19508 [pdf, other]

Improved Alignment of Modalities in Large Vision Language Models

Authors: Kartik Jangra, Aman Kumar Singh, Yashwani Mann, Geetanjali Rathee

Abstract: Recent advancements in vision-language models have achieved remarkable results in making language models understand vision inputs. However, a unified approach to align these models across diverse tasks such as image captioning and visual question answering remains a challenge. Existing methods either require very big language models or very big datasets which is not efficient in utilizing existing… ▽ More Recent advancements in vision-language models have achieved remarkable results in making language models understand vision inputs. However, a unified approach to align these models across diverse tasks such as image captioning and visual question answering remains a challenge. Existing methods either require very big language models or very big datasets which is not efficient in utilizing existing models. This paper addresses this gap and devises a training strategy of auto-regressive vision-language models, to unify vision-language tasks like image-captioning and visual question answering. We propose four training stages for aligning the vision model with the language model, in other words, the language model is given an ability to process visual inputs. We also devise different attention masks for training transformer-based language models that improve the quality of visual features. Further, we introduce some findings, 1) the attention mask should not be applied on visual inputs, 2) the Language model converges faster on AI- generated data, 3) More work should be done in the alignment stage during the pre-training of the model, 4) the model can easily adapt to any downstream tasks like visual question answering on healthcare datasets like PathVQA. After training the model for one epoch for all the stages, it outperforms large models like VILA-13 billion models on common benchmarks like CIDEr scores on COCO and Flickr30k datasets and achieves very close scores to GIT-2 on the same dataset despite being a much smaller model trained on a much smaller dataset. All of the training is done using best practices available like multi- GPU parallel training, lower-precision training with 16-bit float numbers, faster attention (SDPA), and gradient accumulation, and completed the training within 12 hours. △ Less

Submitted 25 March, 2025; originally announced March 2025.

arXiv:2503.17985 [pdf, other]

Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree

Authors: Mahsa Khosravi, Zhanhong Jiang, Joshua R Waite, Sarah Jonesc, Hernan Torres, Arti Singh, Baskar Ganapathysubramanian, Asheesh Kumar Singh, Soumik Sarkar

Abstract: This paper presents a novel reinforcement learning (RL)-based planning scheme for optimized robotic management of biotic stresses in precision agriculture. The framework employs a hierarchical decision-making structure with conditional action masking, where high-level actions direct the robot's exploration, while low-level actions optimize its navigation and efficient chemical spraying in affected… ▽ More This paper presents a novel reinforcement learning (RL)-based planning scheme for optimized robotic management of biotic stresses in precision agriculture. The framework employs a hierarchical decision-making structure with conditional action masking, where high-level actions direct the robot's exploration, while low-level actions optimize its navigation and efficient chemical spraying in affected areas. The key objectives of optimization include improving the coverage of infected areas with limited battery power and reducing chemical usage, thus preventing unnecessary spraying of healthy areas of the field. Our numerical experimental results demonstrate that the proposed method, Hierarchical Action Masking Proximal Policy Optimization (HAM-PPO), significantly outperforms baseline practices, such as LawnMower navigation + indiscriminate spraying (Carpet Spray), in terms of yield recovery and resource efficiency. HAM-PPO consistently achieves higher yield recovery percentages and lower chemical costs across a range of infection scenarios. The framework also exhibits robustness to observation noise and generalizability under diverse environmental conditions, adapting to varying infection ranges and spatial distribution patterns. △ Less

Submitted 23 March, 2025; originally announced March 2025.

Comments: 32 pages, 9 figures

arXiv:2503.12894 [pdf, other]

Function-Correcting Codes for b-Symbol Read Channels

Authors: Anamika Singh, Abhay Kumar Singh, Eitan Yaakobi

Abstract: Function-correcting codes are an innovative class of codes that are designed to protect a function evaluation of the data against errors or corruptions. Due to its usefulness in machine learning applications and archival data storage, where preserving the integrity of computation is crucial, Lenz et al. recently introduced function-correcting codes for binary symmetric channels to safeguard functi… ▽ More Function-correcting codes are an innovative class of codes that are designed to protect a function evaluation of the data against errors or corruptions. Due to its usefulness in machine learning applications and archival data storage, where preserving the integrity of computation is crucial, Lenz et al. recently introduced function-correcting codes for binary symmetric channels to safeguard function evaluation against errors. Xia et al. expanded this concept to symbol-pair read channels over binary fields. The current paper further advances the theory by developing function-correcting codes for b-symbol read channels over finite fields. We introduce the idea of irregular b-symbol distance codes and establish bounds on their performance over finite fields. This concept helps in understanding the behavior of function-correcting codes in more complex settings. We also present a graphical approach of the problem of constructing function-correcting b-symbol codes. Furthermore, we apply these general concepts to specific classes of functions and compare the redundancy of function-correcting b-symbol codes with classical b-symbol codes. Our findings demonstrate that function-correcting b-symbol codes achieve lower redundancy while maintaining reliability. △ Less

Submitted 17 March, 2025; originally announced March 2025.

Comments: 28 pages, 4 figures

arXiv:2503.10486 [pdf, ps, other]

LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions

Authors: Gaurav Kumar Gupta, Pranal Pande, Nirajan Acharya, Aniket Kumar Singh, Suman Niroula

Abstract: Large Language Models (LLMs) are revolutionizing medical diagnostics by enhancing both disease classification and clinical decision-making. In this study, we evaluate the performance of two LLM- based diagnostic tools, DeepSeek R1 and O3 Mini, using a structured dataset of symptoms and diagnoses. We assessed their predictive accuracy at both the disease and category levels, as well as the reliabil… ▽ More Large Language Models (LLMs) are revolutionizing medical diagnostics by enhancing both disease classification and clinical decision-making. In this study, we evaluate the performance of two LLM- based diagnostic tools, DeepSeek R1 and O3 Mini, using a structured dataset of symptoms and diagnoses. We assessed their predictive accuracy at both the disease and category levels, as well as the reliability of their confidence scores. DeepSeek R1 achieved a disease-level accuracy of 76% and an overall accuracy of 82%, outperforming O3 Mini, which attained 72% and 75% respectively. Notably, DeepSeek R1 demonstrated exceptional performance in Mental Health, Neurological Disorders, and Oncology, where it reached 100% accuracy, while O3 Mini excelled in Autoimmune Disease classification with 100% accuracy. Both models, however, struggled with Respiratory Disease classification, recording accuracies of only 40% for DeepSeek R1 and 20% for O3 Mini. Additionally, the analysis of confidence scores revealed that DeepSeek R1 provided high-confidence predictions in 92% of cases, compared to 68% for O3 Mini. Ethical considerations regarding bias, model interpretability, and data privacy are also discussed to ensure the responsible integration of LLMs into clinical practice. Overall, our findings offer valuable insights into the strengths and limitations of LLM-based diagnostic systems and provide a roadmap for future enhancements in AI-driven healthcare. △ Less

Submitted 19 June, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

Comments: 12 pages, 3 figures

arXiv:2503.05631 [pdf, other]

Strategy Coopetition Explains the Emergence and Transience of In-Context Learning

Authors: Aaditya K. Singh, Ted Moskovitz, Sara Dragutinovic, Felix Hill, Stephanie C. Y. Chan, Andrew M. Saxe

Abstract: In-context learning (ICL) is a powerful ability that emerges in transformer models, enabling them to learn from context without weight updates. Recent work has established emergent ICL as a transient phenomenon that can sometimes disappear after long training times. In this work, we sought a mechanistic understanding of these transient dynamics. Firstly, we find that, after the disappearance of IC… ▽ More In-context learning (ICL) is a powerful ability that emerges in transformer models, enabling them to learn from context without weight updates. Recent work has established emergent ICL as a transient phenomenon that can sometimes disappear after long training times. In this work, we sought a mechanistic understanding of these transient dynamics. Firstly, we find that, after the disappearance of ICL, the asymptotic strategy is a remarkable hybrid between in-weights and in-context learning, which we term "context-constrained in-weights learning" (CIWL). CIWL is in competition with ICL, and eventually replaces it as the dominant strategy of the model (thus leading to ICL transience). However, we also find that the two competing strategies actually share sub-circuits, which gives rise to cooperative dynamics as well. For example, in our setup, ICL is unable to emerge quickly on its own, and can only be enabled through the simultaneous slow development of asymptotic CIWL. CIWL thus both cooperates and competes with ICL, a phenomenon we term "strategy coopetition." We propose a minimal mathematical model that reproduces these key dynamics and interactions. Informed by this model, we were able to identify a setup where ICL is truly emergent and persistent. △ Less

Submitted 10 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

Comments: 20 pages, 18 figures

arXiv:2502.18339 [pdf, other]

Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks

Authors: Rylan Schaeffer, Punit Singh Koura, Binh Tang, Ranjan Subramanian, Aaditya K Singh, Todor Mihaylov, Prajjwal Bhargava, Lovish Madaan, Niladri S. Chatterji, Vedanuj Goswami, Sergey Edunov, Dieuwke Hupkes, Sanmi Koyejo, Sharan Narang

Abstract: The explosion of high-performing conversational language models (LMs) has spurred a shift from classic natural language processing (NLP) benchmarks to expensive, time-consuming and noisy human evaluations - yet the relationship between these two evaluation strategies remains hazy. In this paper, we conduct a large-scale study of four Chat Llama 2 models, comparing their performance on 160 standard… ▽ More The explosion of high-performing conversational language models (LMs) has spurred a shift from classic natural language processing (NLP) benchmarks to expensive, time-consuming and noisy human evaluations - yet the relationship between these two evaluation strategies remains hazy. In this paper, we conduct a large-scale study of four Chat Llama 2 models, comparing their performance on 160 standard NLP benchmarks (e.g., MMLU, ARC, BIG-Bench Hard) against extensive human preferences on more than 11k single-turn and 2k multi-turn dialogues from over 2k human annotators. Our findings are striking: most NLP benchmarks strongly correlate with human evaluations, suggesting that cheaper, automated metrics can serve as surprisingly reliable predictors of human preferences. Three human evaluations, such as adversarial dishonesty and safety, are anticorrelated with NLP benchmarks, while two are uncorrelated. Moreover, through overparameterized linear regressions, we show that NLP scores can accurately predict human evaluations across different model scales, offering a path to reduce costly human annotation without sacrificing rigor. Overall, our results affirm the continued value of classic benchmarks and illuminate how to harness them to anticipate real-world user satisfaction - pointing to how NLP benchmarks can be leveraged to meet evaluation needs of our new era of conversational AI. △ Less

Submitted 23 February, 2025; originally announced February 2025.

arXiv:2502.03149 [pdf, other]

doi 10.1109/TSMC.2025.3525956

Secure Resource Management in Cloud Computing: Challenges, Strategies and Meta-Analysis

Authors: Deepika Saxena, Smruti Rekha Swain, Jatinder Kumar, Sakshi Patni, Kishu Gupta, Ashutosh Kumar Singh, Volker Lindenstruth

Abstract: Secure resource management (SRM) within a cloud computing environment is a critical yet infrequently studied research topic. This paper provides a comprehensive survey and comparative performance evaluation of potential cyber threat countermeasure strategies that address security challenges during cloud workload execution and resource management. Cybersecurity is explored specifically in the conte… ▽ More Secure resource management (SRM) within a cloud computing environment is a critical yet infrequently studied research topic. This paper provides a comprehensive survey and comparative performance evaluation of potential cyber threat countermeasure strategies that address security challenges during cloud workload execution and resource management. Cybersecurity is explored specifically in the context of cloud resource management, with an emphasis on identifying the associated challenges. The cyber threat countermeasure methods are categorized into three classes: defensive strategies, mitigating strategies, and hybrid strategies. The existing countermeasure strategies belonging to each class are thoroughly discussed and compared. In addition to conceptual and theoretical analysis, the leading countermeasure strategies within these categories are implemented on a common platform and examined using two real-world virtual machine (VM) data traces. Based on this comprehensive study and performance evaluation, the paper discusses the trade-offs among these countermeasure strategies and their utility, providing imperative concluding remarks on the holistic study of cloud cyber threat countermeasures and secure resource management. Furthermore, the study suggests future methodologies that could effectively address the emerging challenges of secure cloud resource management. △ Less

Submitted 5 February, 2025; originally announced February 2025.

Comments: 16 Pages, 12 Figures, 6 Tables, in IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025

arXiv:2501.19045 [pdf, other]

doi 10.1109/LRA.2025.3565335

Trajectory Optimization Under Stochastic Dynamics Leveraging Maximum Mean Discrepancy

Authors: Basant Sharma, Arun Kumar Singh

Abstract: This paper addresses sampling-based trajectory optimization for risk-aware navigation under stochastic dynamics. Typically such approaches operate by computing $\tilde{N}$ perturbed rollouts around the nominal dynamics to estimate the collision risk associated with a sequence of control commands. We consider a setting where it is expensive to estimate risk using perturbed rollouts, for example, du… ▽ More This paper addresses sampling-based trajectory optimization for risk-aware navigation under stochastic dynamics. Typically such approaches operate by computing $\tilde{N}$ perturbed rollouts around the nominal dynamics to estimate the collision risk associated with a sequence of control commands. We consider a setting where it is expensive to estimate risk using perturbed rollouts, for example, due to expensive collision-checks. We put forward two key contributions. First, we develop an algorithm that distills the statistical information from a larger set of rollouts to a reduced-set with sample size $N<<\tilde{N}$. Consequently, we estimate collision risk using just $N$ rollouts instead of $\tilde{N}$. Second, we formulate a novel surrogate for the collision risk that can leverage the distilled statistical information contained in the reduced-set. We formalize both algorithmic contributions using distribution embedding in Reproducing Kernel Hilbert Space (RKHS) and Maximum Mean Discrepancy (MMD). We perform extensive benchmarking to demonstrate that our MMD-based approach leads to safer trajectories at low sample regime than existing baselines using Conditional Value-at Risk (CVaR) based collision risk estimate. △ Less

Submitted 10 April, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

Comments: https://github.com/Basant1861/MPC-MMD

arXiv:2501.19042 [pdf, other]

Swarm-Gen: Fast Generation of Diverse Feasible Swarm Behaviors

Authors: Simon Idoko, B. Bhanu Teja, K. Madhava Krishna, Arun Kumar Singh

Abstract: Coordination behavior in robot swarms is inherently multi-modal in nature. That is, there are numerous ways in which a swarm of robots can avoid inter-agent collisions and reach their respective goals. However, the problem of generating diverse and feasible swarm behaviors in a scalable manner remains largely unaddressed. In this paper, we fill this gap by combining generative models with a safety… ▽ More Coordination behavior in robot swarms is inherently multi-modal in nature. That is, there are numerous ways in which a swarm of robots can avoid inter-agent collisions and reach their respective goals. However, the problem of generating diverse and feasible swarm behaviors in a scalable manner remains largely unaddressed. In this paper, we fill this gap by combining generative models with a safety-filter (SF). Specifically, we sample diverse trajectories from a learned generative model which is subsequently projected onto the feasible set using the SF. We experiment with two choices for generative models, namely: Conditional Variational Autoencoder (CVAE) and Vector-Quantized Variational Autoencoder (VQ-VAE). We highlight the trade-offs these two models provide in terms of computation time and trajectory diversity. We develop a custom solver for our SF and equip it with a neural network that predicts context-specific initialization. Thecinitialization network is trained in a self-supervised manner, taking advantage of the differentiability of the SF solver. We provide two sets of empirical results. First, we demonstrate that we can generate a large set of multi-modal, feasible trajectories, simulating diverse swarm behaviors, within a few tens of milliseconds. Second, we show that our initialization network provides faster convergence of our SF solver vis-a-vis other alternative heuristics. △ Less

Submitted 31 January, 2025; originally announced January 2025.

Comments: Submitted to RAL

arXiv:2501.16265 [pdf, other]

Training Dynamics of In-Context Learning in Linear Attention

Authors: Yedi Zhang, Aaditya K. Singh, Peter E. Latham, Andrew Saxe

Abstract: While attention-based models have demonstrated the remarkable ability of in-context learning (ICL), the theoretical understanding of how these models acquired this ability through gradient descent training is still preliminary. Towards answering this question, we study the gradient descent dynamics of multi-head linear self-attention trained for in-context linear regression. We examine two paramet… ▽ More While attention-based models have demonstrated the remarkable ability of in-context learning (ICL), the theoretical understanding of how these models acquired this ability through gradient descent training is still preliminary. Towards answering this question, we study the gradient descent dynamics of multi-head linear self-attention trained for in-context linear regression. We examine two parametrizations of linear self-attention: one with the key and query weights merged as a single matrix (common in theoretical studies), and one with separate key and query matrices (closer to practical settings). For the merged parametrization, we show that the training dynamics has two fixed points and the loss trajectory exhibits a single, abrupt drop. We derive an analytical time-course solution for a certain class of datasets and initialization. For the separate parametrization, we show that the training dynamics has exponentially many fixed points and the loss exhibits saddle-to-saddle dynamics, which we reduce to scalar ordinary differential equations. During training, the model implements principal component regression in context with the number of principal components increasing over training time. Overall, we provide a theoretical description of how ICL abilities evolve during gradient descent training of linear attention, revealing abrupt acquisition or progressive improvements depending on how the key and query are parametrized. △ Less

Submitted 27 May, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

Comments: ICML 2025 Spotlight

arXiv:2501.11477 [pdf]

Image Classification Method using Dynamic Quantum Inspired Genetic Algorithm

Authors: Akhilesh Kumar Singh, Kirankumar R. Hiremath

Abstract: This study presents a dynamic Quantum-Inspired Genetic Algorithm (D-QIGA) for feature selection, leveraging quantum principles like superposition and rotation gates to enhance exploration and exploitation. D-QIGA introduces adaptive mechanisms and a lengthening chromosome strategy to avoid local optima and improve optimization. Tested on benchmark and real-world problems, it significantly outperfo… ▽ More This study presents a dynamic Quantum-Inspired Genetic Algorithm (D-QIGA) for feature selection, leveraging quantum principles like superposition and rotation gates to enhance exploration and exploitation. D-QIGA introduces adaptive mechanisms and a lengthening chromosome strategy to avoid local optima and improve optimization. Tested on benchmark and real-world problems, it significantly outperforms traditional Genetic Algorithms, achieving over 99.99% classification accuracy compared to GA's 95%. △ Less

Submitted 4 April, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

arXiv:2412.14495 [pdf, other]

doi 10.1016/j.asoc.2024.111519

FedMUP: Federated Learning driven Malicious User Prediction Model for Secure Data Distribution in Cloud Environments

Authors: Kishu Gupta, Deepika Saxena, Rishabh Gupta, Jatinder Kumar, Ashutosh Kumar Singh

Abstract: Cloud computing is flourishing at a rapid pace. Significant consequences related to data security appear as a malicious user may get unauthorized access to sensitive data which may be misused, further. This raises an alarm-ringing situation to tackle the crucial issue related to data security and proactive malicious user prediction. This article proposes a Federated learning driven Malicious User… ▽ More Cloud computing is flourishing at a rapid pace. Significant consequences related to data security appear as a malicious user may get unauthorized access to sensitive data which may be misused, further. This raises an alarm-ringing situation to tackle the crucial issue related to data security and proactive malicious user prediction. This article proposes a Federated learning driven Malicious User Prediction Model for Secure Data Distribution in Cloud Environments (FedMUP). This approach firstly analyses user behavior to acquire multiple security risk parameters. Afterward, it employs the federated learning-driven malicious user prediction approach to reveal doubtful users, proactively. FedMUP trains the local model on their local dataset and transfers computed values rather than actual raw data to obtain an updated global model based on averaging various local versions. This updated model is shared repeatedly at regular intervals with the user for retraining to acquire a better, and more efficient model capable of predicting malicious users more precisely. Extensive experimental work and comparison of the proposed model with state-of-the-art approaches demonstrate the efficiency of the proposed work. Significant improvement is observed in the key performance indicators such as malicious user prediction accuracy, precision, recall, and f1-score up to 14.32%, 17.88%, 14.32%, and 18.35%, respectively. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: 33 pages, 9 figures

Journal ref: Fedmup: Federated learning driven malicious user prediction model for secure data distribution in cloud environments, Applied Soft Computing, vol. 157, p. 111519, 2024

arXiv:2412.14490 [pdf, other]

doi 10.1007/s10586-023-04263-9

MAIDS: Malicious Agent Identification-based Data Security Model for Cloud Environments

Authors: Kishu Gupta, Deepika Saxena, Rishabh Gupta, Ashutosh Kumar Singh

Abstract: With the vigorous development of cloud computing, most organizations have shifted their data and applications to the cloud environment for storage, computation, and sharing purposes. During storage and data sharing across the participating entities, a malicious agent may gain access to outsourced data from the cloud environment. A malicious agent is an entity that deliberately breaches the data. T… ▽ More With the vigorous development of cloud computing, most organizations have shifted their data and applications to the cloud environment for storage, computation, and sharing purposes. During storage and data sharing across the participating entities, a malicious agent may gain access to outsourced data from the cloud environment. A malicious agent is an entity that deliberately breaches the data. This information accessed might be misused or revealed to unauthorized parties. Therefore, data protection and prediction of malicious agents have become a demanding task that needs to be addressed appropriately. To deal with this crucial and challenging issue, this paper presents a Malicious Agent Identification-based Data Security (MAIDS) Model which utilizes XGBoost machine learning classification algorithm for securing data allocation and communication among different participating entities in the cloud system. The proposed model explores and computes intended multiple security parameters associated with online data communication or transactions. Correspondingly, a security-focused knowledge database is produced for developing the XGBoost Classifier-based Malicious Agent Prediction (XC-MAP) unit. Unlike the existing approaches, which only identify malicious agents after data leaks, MAIDS proactively identifies malicious agents by examining their eligibility for respective data access. In this way, the model provides a comprehensive solution to safeguard crucial data from both intentional and non-intentional breaches, by granting data to authorized agents only by evaluating the agents behavior and predicting the malicious agent before granting data. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: 28 pages, 10 figures

Journal ref: Cluster Comput 27, 6167 to 6184, (2024)

arXiv:2412.09696 [pdf, other]

Soybean Maturity Prediction using 2D Contour Plots from Drone based Time Series Imagery

Authors: Bitgoeul Kim, Samuel W. Blair, Talukder Z. Jubery, Soumik Sarkar, Arti Singh, Asheesh K. Singh, Baskar Ganapathysubramanian

Abstract: Plant breeding programs require assessments of days to maturity for accurate selection and placement of entries in appropriate tests. In the early stages of the breeding pipeline, soybean breeding programs assign relative maturity ratings to experimental varieties that indicate their suitable maturity zones. Traditionally, the estimation of maturity value for breeding varieties has involved breede… ▽ More Plant breeding programs require assessments of days to maturity for accurate selection and placement of entries in appropriate tests. In the early stages of the breeding pipeline, soybean breeding programs assign relative maturity ratings to experimental varieties that indicate their suitable maturity zones. Traditionally, the estimation of maturity value for breeding varieties has involved breeders manually inspecting fields and assessing maturity value visually. This approach relies heavily on rater judgment, making it subjective and time-consuming. This study aimed to develop a machine-learning model for evaluating soybean maturity using UAV-based time-series imagery. Images were captured at three-day intervals, beginning as the earliest varieties started maturing and continuing until the last varieties fully matured. The data collected for this experiment consisted of 22,043 plots collected across three years (2021 to 2023) and represent relative maturity groups 1.6 - 3.9. We utilized contour plot images extracted from the time-series UAV RGB imagery as input for a neural network model. This contour plot approach encoded the temporal and spatial variation within each plot into a single image. A deep learning model was trained to utilize this contour plot to predict maturity ratings. This model significantly improves accuracy and robustness, achieving up to 85% accuracy. We also evaluate the model's accuracy as we reduce the number of time points, quantifying the trade-off between temporal resolution and maturity prediction. The predictive model offers a scalable, objective, and efficient means of assessing crop maturity, enabling phenomics and ML approaches to reduce the reliance on manual inspection and subjective assessment. This approach enables the automatic prediction of relative maturity ratings in a breeding program, saving time and resources. △ Less

Submitted 12 December, 2024; originally announced December 2024.

arXiv:2412.09121 [pdf, ps, other]

MMD-OPT : Maximum Mean Discrepancy Based Sample Efficient Collision Risk Minimization for Autonomous Driving

Authors: Basant Sharma, Arun Kumar Singh

Abstract: We propose MMD-OPT: a sample-efficient approach for minimizing the risk of collision under arbitrary prediction distribution of the dynamic obstacles. MMD-OPT is based on embedding distribution in Reproducing Kernel Hilbert Space (RKHS) and the associated Maximum Mean Discrepancy (MMD). We show how these two concepts can be used to define a sample efficient surrogate for collision risk estimate. W… ▽ More We propose MMD-OPT: a sample-efficient approach for minimizing the risk of collision under arbitrary prediction distribution of the dynamic obstacles. MMD-OPT is based on embedding distribution in Reproducing Kernel Hilbert Space (RKHS) and the associated Maximum Mean Discrepancy (MMD). We show how these two concepts can be used to define a sample efficient surrogate for collision risk estimate. We perform extensive simulations to validate the effectiveness of MMD-OPT on both synthetic and real-world datasets. Importantly, we show that trajectory optimization with our MMD-based collision risk surrogate leads to safer trajectories at low sample regimes than popular alternatives based on Conditional Value at Risk (CVaR). △ Less

Submitted 7 July, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

arXiv:2412.08819 [pdf, other]

HARP: A challenging human-annotated math reasoning benchmark

Authors: Albert S. Yue, Lovish Madaan, Ted Moskovitz, DJ Strouse, Aaditya K. Singh

Abstract: Math reasoning is becoming an ever increasing area of focus as we scale large language models. However, even the previously-toughest evals like MATH are now close to saturated by frontier models (90.0% for o1-mini and 86.5% for Gemini 1.5 Pro). We introduce HARP, Human Annotated Reasoning Problems (for Math), consisting of 5,409 problems from the US national math competitions (A(J)HSME, AMC, AIME,… ▽ More Math reasoning is becoming an ever increasing area of focus as we scale large language models. However, even the previously-toughest evals like MATH are now close to saturated by frontier models (90.0% for o1-mini and 86.5% for Gemini 1.5 Pro). We introduce HARP, Human Annotated Reasoning Problems (for Math), consisting of 5,409 problems from the US national math competitions (A(J)HSME, AMC, AIME, USA(J)MO). Of these, 4,780 have answers that are automatically check-able (with libraries such as SymPy). These problems range six difficulty levels, with frontier models performing relatively poorly on the hardest bracket of 197 problems (average accuracy 41.1% for o1-mini, and 9.6% for Gemini 1.5 Pro). Our dataset also features multiple choices (for 4,110 problems) and an average of two human-written, ground-truth solutions per problem, offering new avenues of research that we explore briefly. We report evaluations for many frontier models and share some interesting analyses, such as demonstrating that frontier models across families intrinsically scale their inference-time compute for more difficult problems. Finally, we open source all code used for dataset construction (including scraping) and all code for evaluation (including answer checking) to enable future research at: https://github.com/aadityasingh/HARP. △ Less

Submitted 11 December, 2024; originally announced December 2024.

Comments: 28 pages, 17 figures

arXiv:2412.03782 [pdf, ps, other]

The broader spectrum of in-context learning

Authors: Andrew Kyle Lampinen, Stephanie C. Y. Chan, Aaditya K. Singh, Murray Shanahan

Abstract: The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can b… ▽ More The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning within a much broader spectrum of meta-learned in-context learning. Indeed, we suggest that any distribution of sequences in which context non-trivially decreases loss on subsequent predictions can be interpreted as eliciting a kind of in-context learning. We suggest that this perspective helps to unify the broad set of in-context abilities that language models exhibit -- such as adapting to tasks from instructions or role play, or extrapolating time series. This perspective also sheds light on potential roots of in-context learning in lower-level processing of linguistic dependencies (e.g. coreference or parallel structures). Finally, taking this perspective highlights the importance of generalization, which we suggest can be studied along several dimensions: not only the ability to learn something novel, but also flexibility in learning from different presentations, and in applying what is learned. We discuss broader connections to past literature in meta-learning and goal-conditioned agents, and other perspectives on learning and adaptation. We close by suggesting that research on in-context learning should consider this broader spectrum of in-context capabilities and types of generalization. △ Less

Submitted 5 June, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

arXiv:2412.02642 [pdf, other]

Robust soybean seed yield estimation using high-throughput ground robot videos

Authors: Jiale Feng, Samuel W. Blair, Timilehin Ayanlade, Aditya Balu, Baskar Ganapathysubramanian, Arti Singh, Soumik Sarkar, Asheesh K Singh

Abstract: We present a novel method for soybean (Glycine max (L.) Merr.) yield estimation leveraging high throughput seed counting via computer vision and deep learning techniques. Traditional methods for collecting yield data are labor-intensive, costly, prone to equipment failures at critical data collection times, and require transportation of equipment across field sites. Computer vision, the field of t… ▽ More We present a novel method for soybean (Glycine max (L.) Merr.) yield estimation leveraging high throughput seed counting via computer vision and deep learning techniques. Traditional methods for collecting yield data are labor-intensive, costly, prone to equipment failures at critical data collection times, and require transportation of equipment across field sites. Computer vision, the field of teaching computers to interpret visual data, allows us to extract detailed yield information directly from images. By treating it as a computer vision task, we report a more efficient alternative, employing a ground robot equipped with fisheye cameras to capture comprehensive videos of soybean plots from which images are extracted in a variety of development programs. These images are processed through the P2PNet-Yield model, a deep learning framework where we combined a Feature Extraction Module (the backbone of the P2PNet-Soy) and a Yield Regression Module to estimate seed yields of soybean plots. Our results are built on three years of yield testing plot data - 8500 in 2021, 2275 in 2022, and 650 in 2023. With these datasets, our approach incorporates several innovations to further improve the accuracy and generalizability of the seed counting and yield estimation architecture, such as the fisheye image correction and data augmentation with random sensor effects. The P2PNet-Yield model achieved a genotype ranking accuracy score of up to 83%. It demonstrates up to a 32% reduction in time to collect yield data as well as costs associated with traditional yield estimation, offering a scalable solution for breeding programs and agricultural productivity enhancement. △ Less

Submitted 3 December, 2024; originally announced December 2024.

Comments: 23 pages, 12 figures, 2 tables

arXiv:2412.01354 [pdf]

Integrative CAM: Adaptive Layer Fusion for Comprehensive Interpretation of CNNs

Authors: Aniket K. Singh, Debasis Chaudhuri, Manish P. Singh, Samiran Chattopadhyay

Abstract: With the growing demand for interpretable deep learning models, this paper introduces Integrative CAM, an advanced Class Activation Mapping (CAM) technique aimed at providing a holistic view of feature importance across Convolutional Neural Networks (CNNs). Traditional gradient-based CAM methods, such as Grad-CAM and Grad-CAM++, primarily use final layer activations to highlight regions of interes… ▽ More With the growing demand for interpretable deep learning models, this paper introduces Integrative CAM, an advanced Class Activation Mapping (CAM) technique aimed at providing a holistic view of feature importance across Convolutional Neural Networks (CNNs). Traditional gradient-based CAM methods, such as Grad-CAM and Grad-CAM++, primarily use final layer activations to highlight regions of interest, often neglecting critical features derived from intermediate layers. Integrative CAM addresses this limitation by fusing insights across all network layers, leveraging both gradient and activation scores to adaptively weight layer contributions, thus yielding a comprehensive interpretation of the model's internal representation. Our approach includes a novel bias term in the saliency map calculation, a factor frequently omitted in existing CAM techniques, but essential for capturing a more complete feature importance landscape, as modern CNNs rely on both weighted activations and biases to make predictions. Additionally, we generalize the alpha term from Grad-CAM++ to apply to any smooth function, expanding CAM applicability across a wider range of models. Through extensive experiments on diverse and complex datasets, Integrative CAM demonstrates superior fidelity in feature importance mapping, effectively enhancing interpretability for intricate fusion scenarios and complex decision-making tasks. By advancing interpretability methods to capture multi-layered model insights, Integrative CAM provides a valuable tool for fusion-driven applications, promoting the trustworthy and insightful deployment of deep learning models. △ Less

Submitted 2 December, 2024; originally announced December 2024.

arXiv:2411.03923 [pdf, other]

Evaluation data contamination in LLMs: how do we measure it and (when) does it matter?

Authors: Aaditya K. Singh, Muhammed Yusuf Kocyigit, Andrew Poulton, David Esiobu, Maria Lomeli, Gergely Szilvasy, Dieuwke Hupkes

Abstract: Hampering the interpretation of benchmark scores, evaluation data contamination has become a growing concern in the evaluation of LLMs, and an active area of research studies its effects. While evaluation data contamination is easily understood intuitively, it is surprisingly difficult to define precisely which samples should be considered contaminated and, consequently, how it impacts benchmark s… ▽ More Hampering the interpretation of benchmark scores, evaluation data contamination has become a growing concern in the evaluation of LLMs, and an active area of research studies its effects. While evaluation data contamination is easily understood intuitively, it is surprisingly difficult to define precisely which samples should be considered contaminated and, consequently, how it impacts benchmark scores. We propose that these questions should be addressed together and that contamination metrics can be assessed based on whether models benefit from the examples they mark contaminated. We propose a novel analysis method called ConTAM, and show with a large scale survey of existing and novel n-gram based contamination metrics across 13 benchmarks and 7 models from 2 different families that ConTAM can be used to better understand evaluation data contamination and its effects. We find that contamination may have a much larger effect than reported in recent LLM releases and benefits models differently at different scales. We also find that considering only the longest contaminated substring provides a better signal than considering a union of all contaminated substrings, and that doing model and benchmark specific threshold analysis greatly increases the specificity of the results. Lastly, we investigate the impact of hyperparameter choices, finding that, among other things, both using larger values of n and disregarding matches that are infrequent in the pre-training data lead to many false negatives. With ConTAM, we provide a method to empirically ground evaluation data contamination metrics in downstream effects. With our exploration, we shed light on how evaluation data contamination can impact LLMs and provide insight into the considerations important when doing contamination analysis. We end our paper by discussing these in more detail and providing concrete suggestions for future work. △ Less

Submitted 6 November, 2024; originally announced November 2024.

arXiv:2410.19712 [pdf, other]

DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control

Authors: Md Faizal Karim, Shreya Bollimuntha, Mohammed Saad Hashmi, Autrio Das, Gaurav Singh, Srinath Sridhar, Arun Kumar Singh, Nagamanikandan Govindan, K Madhava Krishna

Abstract: Dual-arm manipulation is an area of growing interest in the robotics community. Enabling robots to perform tasks that require the coordinated use of two arms, is essential for complex manipulation tasks such as handling large objects, assembling components, and performing human-like interactions. However, achieving effective dual-arm manipulation is challenging due to the need for precise coordina… ▽ More Dual-arm manipulation is an area of growing interest in the robotics community. Enabling robots to perform tasks that require the coordinated use of two arms, is essential for complex manipulation tasks such as handling large objects, assembling components, and performing human-like interactions. However, achieving effective dual-arm manipulation is challenging due to the need for precise coordination, dynamic adaptability, and the ability to manage interaction forces between the arms and the objects being manipulated. We propose a novel pipeline that combines the advantages of policy learning based on environment feedback and gradient-based optimization to learn controller gains required for the control outputs. This allows the robotic system to dynamically modulate its impedance in response to task demands, ensuring stability and dexterity in dual-arm operations. We evaluate our pipeline on a trajectory-tracking task involving a variety of large, complex objects with different masses and geometries. The performance is then compared to three other established methods for controlling dual-arm robots, demonstrating superior results. △ Less

Submitted 25 October, 2024; originally announced October 2024.

arXiv:2410.18751 [pdf, ps, other]

Double Auctions: Formalization and Automated Checkers

Authors: Mohit Garg, N. Raja, Suneel Sarswat, Abhishek Kr Singh

Abstract: Double auctions are widely used in financial markets, such as those for stocks, derivatives, currencies, and commodities, to match demand and supply. Once all buyers and sellers have placed their trade requests, the exchange determines how these requests are to be matched. The two most common objectives for determining the matching are maximizing trade volume at a uniform price and maximizing trad… ▽ More Double auctions are widely used in financial markets, such as those for stocks, derivatives, currencies, and commodities, to match demand and supply. Once all buyers and sellers have placed their trade requests, the exchange determines how these requests are to be matched. The two most common objectives for determining the matching are maximizing trade volume at a uniform price and maximizing trade volume through dynamic pricing. Prior research has primarily focused on single-quantity trade requests. In this work, we extend the framework to handle multiple-quantity trade requests and present fully formalized matching algorithms for double auctions, along with their correctness proofs. We establish new uniqueness theorems, enabling automatic detection of violations in exchange systems by comparing their output to that of a verified program. All proofs are formalized in the Coq Proof Assistant, and we extract verified OCaml and Haskell programs that could serve as a resource for exchanges and market regulators. We demonstrate the practical applicability of our work by running the verified program on real market data from an exchange to automatically check for violations in the exchange algorithm. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: 23 pages, Preliminary version of this work was published in ITP 2021

ACM Class: F.3.1; K.4.4

arXiv:2410.18494 [pdf, other]

Assured Automatic Programming via Large Language Models

Authors: Martin Mirchev, Andreea Costea, Abhishek Kr Singh, Abhik Roychoudhury

Abstract: With the advent of AI-based coding engines, it is possible to convert natural language requirements to executable code in standard programming languages. However, AI-generated code can be unreliable, and the natural language requirements driving this code may be ambiguous. In other words, the intent may not be accurately captured in the code generated from AI-coding engines like Copilot. The goal… ▽ More With the advent of AI-based coding engines, it is possible to convert natural language requirements to executable code in standard programming languages. However, AI-generated code can be unreliable, and the natural language requirements driving this code may be ambiguous. In other words, the intent may not be accurately captured in the code generated from AI-coding engines like Copilot. The goal of our work is to discover the programmer intent, while generating code which conforms to the intent and a proof of this conformance. Our approach to intent discovery is powered by a novel repair engine called program-proof co-evolution, where the object of repair is a tuple (code, logical specification, test) generated by an LLM from the same natural language description. The program and the specification capture the initial operational and declarative description of intent, while the test represents a concrete, albeit partial, understanding of the intent. Our objective is to achieve consistency between the program, the specification, and the test by incrementally refining our understanding of the user intent. Reaching consistency through this repair process provides us with a formal, logical description of the intent, which is then translated back into natural language for the developer's inspection. The resultant intent description is now unambiguous, though expressed in natural language. We demonstrate how the unambiguous intent discovered through our approach increases the percentage of verifiable auto-generated programs on a recently proposed dataset in the Dafny programming language. △ Less

Submitted 4 November, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

arXiv:2410.09339 [pdf]

Advanced Gesture Recognition for Autism Spectrum Disorder Detection: Integrating YOLOv7, Video Augmentation, and VideoMAE for Naturalistic Video Analysis

Authors: Amit Kumar Singh, Vrijendra Singh

Abstract: Deep learning and contactless sensing technologies have significantly advanced the automated assessment of human behaviors in healthcare. In the context of autism spectrum disorder (ASD), repetitive motor behaviors such as spinning, head banging, and arm flapping are key indicators for diagnosis. This study focuses on distinguishing between children with ASD and typically developed (TD) peers by a… ▽ More Deep learning and contactless sensing technologies have significantly advanced the automated assessment of human behaviors in healthcare. In the context of autism spectrum disorder (ASD), repetitive motor behaviors such as spinning, head banging, and arm flapping are key indicators for diagnosis. This study focuses on distinguishing between children with ASD and typically developed (TD) peers by analyzing videos captured in natural, uncontrolled environments. Using the publicly available Self-Stimulatory Behavior Dataset (SSBD), we address the classification task as a binary problem, ASD vs. TD, based on stereotypical repetitive gestures. We adopt a pipeline integrating YOLOv7-based detection, extensive video augmentations, and the VideoMAE framework, which efficiently captures both spatial and temporal features through a high-ratio masking and reconstruction strategy. Our proposed approach achieves 95% accuracy, 0.93 precision, 0.94 recall, and 0.94 F1 score, surpassing the previous state-of-the-art by a significant margin. These results demonstrate the effectiveness of combining advanced object detection, robust data augmentation, and masked autoencoder-based video modeling for reliable ASD vs. TD classification in naturalistic settings. △ Less

Submitted 17 August, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

Comments: Change Note for Version 3 - Extended Study (ASD vs TD Classification) This version extends v2 from 3-class gesture recognition to binary ASD vs TD detection, using expanded SSBD variants, a new TD class, improved preprocessing, and updated metrics (95% acc, 0.93 prec, 0.94 rec, 0.94 F1). Methodology remains YOLOv7 + VideoMAE + augmentation

arXiv:2410.03621 [pdf, other]

doi 10.1109/TCE.2024.3373912

A Global Medical Data Security and Privacy Preserving Standards Identification Framework for Electronic Healthcare Consumers

Authors: Vinaytosh Mishra, Kishu Gupta, Deepika Saxena, Ashutosh Kumar Singh

Abstract: Electronic Health Records (EHR) are crucial for the success of digital healthcare, with a focus on putting consumers at the center of this transformation. However, the digitalization of healthcare records brings along security and privacy risks for personal data. The major concern is that different countries have varying standards for the security and privacy of medical data. This paper proposed a… ▽ More Electronic Health Records (EHR) are crucial for the success of digital healthcare, with a focus on putting consumers at the center of this transformation. However, the digitalization of healthcare records brings along security and privacy risks for personal data. The major concern is that different countries have varying standards for the security and privacy of medical data. This paper proposed a novel and comprehensive framework to standardize these rules globally, bringing them together on a common platform. To support this proposal, the study reviews existing literature to understand the research interest in this issue. It also examines six key laws and standards related to security and privacy, identifying twenty concepts. The proposed framework utilized K-means clustering to categorize these concepts and identify five key factors. Finally, an Ordinal Priority Approach is applied to determine the preferred implementation of these factors in the context of EHRs. The proposed study provides a descriptive then prescriptive framework for the implementation of privacy and security in the context of electronic health records. Therefore, the findings of the proposed framework are useful for professionals and policymakers in improving the security and privacy associated with EHRs. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Journal ref: A Global Medical Data Security and Privacy Preserving Standards Identification Framework for Electronic Healthcare Consumers, in IEEE Transactions on Consumer Electronics, vol. 70, no. 1, pp. 4379-4387, Feb. 2024

arXiv:2410.03217 [pdf, other]

doi 10.1109/TASE.2024.3456209

An Intelligent Quantum Cyber-Security Framework for Healthcare Data Management

Authors: Kishu Gupta, Deepika Saxena, Pooja Rani, Jitendra Kumar, Aaisha Makkar, Ashutosh Kumar Singh, Chung-Nan Lee

Abstract: Digital healthcare is essential to facilitate consumers to access and disseminate their medical data easily for enhanced medical care services. However, the significant concern with digitalization across healthcare systems necessitates for a prompt, productive, and secure storage facility along with a vigorous communication strategy, to stimulate sensitive digital healthcare data sharing and proac… ▽ More Digital healthcare is essential to facilitate consumers to access and disseminate their medical data easily for enhanced medical care services. However, the significant concern with digitalization across healthcare systems necessitates for a prompt, productive, and secure storage facility along with a vigorous communication strategy, to stimulate sensitive digital healthcare data sharing and proactive estimation of malicious entities. In this context, this paper introduces a comprehensive quantum-based framework to overwhelm the potential security and privacy issues for secure healthcare data management. It equips quantum encryption for the secured storage and dispersal of healthcare data over the shared cloud platform by employing quantum encryption. Also, the framework furnishes a quantum feed-forward neural network unit to examine the intention behind the data request before granting access, for proactive estimation of potential data breach. In this way, the proposed framework delivers overall healthcare data management by coupling the advanced and more competent quantum approach with machine learning to safeguard the data storage, access, and prediction of malicious entities in an automated manner. Thus, the proposed IQ-HDM leads to more cooperative and effective healthcare delivery and empowers individuals with adequate custody of their health data. The experimental evaluation and comparison of the proposed IQ-HDM framework with state-of-the-art methods outline a considerable improvement up to 67.6%, in tackling cyber threats related to healthcare data security. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Journal ref: IEEE Transactions on Automation Science and Engineering (2024)

arXiv:2409.16011 [pdf, other]

CrowdSurfer: Sampling Optimization Augmented with Vector-Quantized Variational AutoEncoder for Dense Crowd Navigation

Authors: Naman Kumar, Antareep Singha, Laksh Nanwani, Dhruv Potdar, Tarun R, Fatemeh Rastgar, Simon Idoko, Arun Kumar Singh, K. Madhava Krishna

Abstract: Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to ge… ▽ More Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to generate sophisticated long-horizon local plans at interactive rates. More specifically, we train a Vector Quantized Variational AutoEncoder to learn a prior over the expert trajectory distribution conditioned on the perception input. At run-time, this is used as an initialization for a sampling-based optimizer for further refinement. Our approach does not require any sophisticated prediction of dynamic obstacles and yet provides state-of-the-art performance. In particular, we compare against the recent DRL-VO approach and show a 40% improvement in success rate and a 6% improvement in travel time. △ Less

Submitted 7 March, 2025; v1 submitted 24 September, 2024; originally announced September 2024.

Comments: Accepted at IEEE ICRA 2025

Showing 1–50 of 268 results for author: Singh, A K