Search | arXiv e-print repository

Students' Reliance on AI in Higher Education: Identifying Contributing Factors

Authors: Griffin Pitts, Neha Rani, Weedguet Mildort, Eva-Marie Cook

Abstract: The increasing availability and use of artificial intelligence (AI) tools in educational settings has raised concerns about students' overreliance on these technologies. Overreliance occurs when individuals accept incorrect AI-generated recommendations, often without critical evaluation, leading to flawed problem solutions and undermining learning outcomes. This study investigates potential factor… ▽ More The increasing availability and use of artificial intelligence (AI) tools in educational settings has raised concerns about students' overreliance on these technologies. Overreliance occurs when individuals accept incorrect AI-generated recommendations, often without critical evaluation, leading to flawed problem solutions and undermining learning outcomes. This study investigates potential factors contributing to patterns of AI reliance among undergraduate students, examining not only overreliance but also appropriate reliance (correctly accepting helpful and rejecting harmful recommendations) and underreliance (incorrectly rejecting helpful recommendations). Our approach combined pre- and post-surveys with a controlled experimental task where participants solved programming problems with an AI assistant that provided both accurate and deliberately incorrect suggestions, allowing direct observation of students' reliance patterns when faced with varying AI reliability. We find that appropriate reliance is significantly related to students' programming self-efficacy, programming literacy, and need for cognition, while showing negative correlations with post-task trust and satisfaction. Overreliance showed significant correlations with post-task trust and satisfaction with the AI assistant. Underreliance was negatively correlated with programming literacy, programming self-efficacy, and need for cognition. Overall, the findings provide insights for developing targeted interventions that promote appropriate reliance on AI tools, with implications for the integration of AI in curriculum and educational technologies. △ Less

Submitted 16 June, 2025; originally announced June 2025.

ACM Class: K.3; K.4; I.2.6

arXiv:2506.12103 [pdf, other]

The Amazon Nova Family of Models: Technical Report and Model Card

Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation. △ Less

Submitted 17 March, 2025; originally announced June 2025.

Comments: 48 pages, 10 figures

Report number: 20250317

arXiv:2506.10910 [pdf, ps, other]

Magistral

Authors: Mistral-AI, :, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou , et al. (76 additional authors not shown)

Abstract: We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s… ▽ More We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a simple method to force the reasoning language of the model, and show that RL on text data alone maintains most of the initial checkpoint's capabilities. We find that RL on text maintains or improves multimodal understanding, instruction following and function calling. We present Magistral Medium, trained for reasoning on top of Mistral Medium 3 with RL alone, and we open-source Magistral Small (Apache 2.0) which further includes cold-start data from Magistral Medium. △ Less

Submitted 12 June, 2025; originally announced June 2025.

arXiv:2506.05739 [pdf, ps, other]

To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt

Authors: Zhilong Wang, Neha Nagaraja, Lan Zhang, Hayretdin Bahsi, Pawan Patil, Peng Liu

Abstract: LLM agents are widely used as agents for customer support, content generation, and code assistance. However, they are vulnerable to prompt injection attacks, where adversarial inputs manipulate the model's behavior. Traditional defenses like input sanitization, guard models, and guardrails are either cumbersome or ineffective. In this paper, we propose a novel, lightweight defense mechanism called… ▽ More LLM agents are widely used as agents for customer support, content generation, and code assistance. However, they are vulnerable to prompt injection attacks, where adversarial inputs manipulate the model's behavior. Traditional defenses like input sanitization, guard models, and guardrails are either cumbersome or ineffective. In this paper, we propose a novel, lightweight defense mechanism called Polymorphic Prompt Assembling (PPA), which protects against prompt injection with near-zero overhead. The approach is based on the insight that prompt injection requires guessing and breaking the structure of the system prompt. By dynamically varying the structure of system prompts, PPA prevents attackers from predicting the prompt structure, thereby enhancing security without compromising performance. We conducted experiments to evaluate the effectiveness of PPA against existing attacks and compared it with other defense methods. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: To appear in the Industry Track of the 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2025)

arXiv:2505.09970 [pdf, ps, other]

Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents

Authors: Mrinal Rawat, Ambuje Gupta, Rushil Goomer, Alessandro Di Bari, Neha Gupta, Roberto Pieraccini

Abstract: The ReAct (Reasoning + Action) capability in large language models (LLMs) has become the foundation of modern agentic systems. Recent LLMs, such as DeepSeek-R1 and OpenAI o1/o3, exemplify this by emphasizing reasoning through the generation of ample intermediate tokens, which help build a strong premise before producing the final output tokens. In this paper, we introduce Pre-Act, a novel approach… ▽ More The ReAct (Reasoning + Action) capability in large language models (LLMs) has become the foundation of modern agentic systems. Recent LLMs, such as DeepSeek-R1 and OpenAI o1/o3, exemplify this by emphasizing reasoning through the generation of ample intermediate tokens, which help build a strong premise before producing the final output tokens. In this paper, we introduce Pre-Act, a novel approach that enhances the agent's performance by creating a multi-step execution plan along with the detailed reasoning for the given user input. This plan incrementally incorporates previous steps and tool outputs, refining itself after each step execution until the final response is obtained. Our approach is applicable to both conversational and non-conversational agents. To measure the performance of task-oriented agents comprehensively, we propose a two-level evaluation framework: (1) turn level and (2) end-to-end. Our turn-level evaluation, averaged across five models, shows that our approach, Pre-Act, outperforms ReAct by 70% in Action Recall on the Almita dataset. While this approach is effective for larger models, smaller models crucial for practical applications, where latency and cost are key constraints, often struggle with complex reasoning tasks required for agentic systems. To address this limitation, we fine-tune relatively small models such as Llama 3.1 (8B & 70B) using the proposed Pre-Act approach. Our experiments show that the fine-tuned 70B model outperforms GPT-4, achieving a 69.5% improvement in action accuracy (turn-level) and a 28% improvement in goal completion rate (end-to-end) on the Almita (out-of-domain) dataset. △ Less

Submitted 18 May, 2025; v1 submitted 15 May, 2025; originally announced May 2025.

arXiv:2504.21187 [pdf, other]

LIFT: LLM-Based Pragma Insertion for HLS via GNN Supervised Fine-Tuning

Authors: Neha Prakriya, Zijian Ding, Yizhou Sun, Jason Cong

Abstract: FPGAs are increasingly adopted in datacenter environments for their reconfigurability and energy efficiency. High-Level Synthesis (HLS) tools have eased FPGA programming by raising the abstraction level from RTL to untimed C/C++, yet attaining high performance still demands expert knowledge and iterative manual insertion of optimization pragmas to modify the microarchitecture. To address this chal… ▽ More FPGAs are increasingly adopted in datacenter environments for their reconfigurability and energy efficiency. High-Level Synthesis (HLS) tools have eased FPGA programming by raising the abstraction level from RTL to untimed C/C++, yet attaining high performance still demands expert knowledge and iterative manual insertion of optimization pragmas to modify the microarchitecture. To address this challenge, we propose LIFT, a large language model (LLM)-based coding assistant for HLS that automatically generates performance-critical pragmas given a C/C++ design. We fine-tune the LLM by tightly integrating and supervising the training process with a graph neural network (GNN), combining the sequential modeling capabilities of LLMs with the structural and semantic understanding of GNNs necessary for reasoning over code and its control/data dependencies. On average, LIFT produces designs that improve performance by 3.52x and 2.16x than prior state-of the art AutoDSE and HARP respectively, and 66x than GPT-4o. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2504.19457 [pdf, other]

Towards Long Context Hallucination Detection

Authors: Siyi Liu, Kishaloy Halder, Zheng Qi, Wei Xiao, Nikolaos Pappas, Phu Mon Htut, Neha Anna John, Yassine Benajiba, Dan Roth

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, they are prone to contextual hallucination, generating information that is either unsubstantiated or contradictory to the given context. Although many studies have investigated contextual hallucinations in LLMs, addressing them in long-context inputs remains an open problem. In this work, we take a… ▽ More Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, they are prone to contextual hallucination, generating information that is either unsubstantiated or contradictory to the given context. Although many studies have investigated contextual hallucinations in LLMs, addressing them in long-context inputs remains an open problem. In this work, we take an initial step toward solving this problem by constructing a dataset specifically designed for long-context hallucination detection. Furthermore, we propose a novel architecture that enables pre-trained encoder models, such as BERT, to process long contexts and effectively detect contextual hallucinations through a decomposition and aggregation mechanism. Our experimental results show that the proposed architecture significantly outperforms previous models of similar size as well as LLM-based models across various metrics, while providing substantially faster inference. △ Less

Submitted 27 April, 2025; originally announced April 2025.

arXiv:2504.16942 [pdf, other]

S2Vec: Self-Supervised Geospatial Embeddings

Authors: Shushman Choudhury, Elad Aharoni, Chandrakumari Suvarna, Iveel Tsogsuren, Abdul Rahman Kreidieh, Chun-Ta Lu, Neha Arora

Abstract: Scalable general-purpose representations of the built environment are crucial for geospatial artificial intelligence applications. This paper introduces S2Vec, a novel self-supervised framework for learning such geospatial embeddings. S2Vec uses the S2 Geometry library to partition large areas into discrete S2 cells, rasterizes built environment feature vectors within cells as images, and applies… ▽ More Scalable general-purpose representations of the built environment are crucial for geospatial artificial intelligence applications. This paper introduces S2Vec, a novel self-supervised framework for learning such geospatial embeddings. S2Vec uses the S2 Geometry library to partition large areas into discrete S2 cells, rasterizes built environment feature vectors within cells as images, and applies masked autoencoding on these rasterized images to encode the feature vectors. This approach yields task-agnostic embeddings that capture local feature characteristics and broader spatial relationships. We evaluate S2Vec on three large-scale socioeconomic prediction tasks, showing its competitive performance against state-of-the-art image-based embeddings. We also explore the benefits of combining S2Vec embeddings with image-based embeddings downstream, showing that such multimodal fusion can often improve performance. Our results highlight how S2Vec can learn effective general-purpose geospatial representations and how it can complement other data modalities in geospatial artificial intelligence. △ Less

Submitted 10 April, 2025; originally announced April 2025.

Comments: To be submitted to ACM Transactions on Spatial Algorithms and Systems

arXiv:2504.16277 [pdf, other]

DataS^3: Dataset Subset Selection for Specialization

Authors: Neha Hulkund, Alaa Maalouf, Levi Cai, Daniel Yang, Tsun-Hsuan Wang, Abigail O'Neil, Timm Haucke, Sandeep Mukherjee, Vikram Ramaswamy, Judy Hansen Shen, Gabriel Tseng, Mike Walmsley, Daniela Rus, Ken Goldberg, Hannah Kerner, Irene Chen, Yogesh Girdhar, Sara Beery

Abstract: In many real-world machine learning (ML) applications (e.g. detecting broken bones in x-ray images, detecting species in camera traps), in practice models need to perform well on specific deployments (e.g. a specific hospital, a specific national park) rather than the domain broadly. However, deployments often have imbalanced, unique data distributions. Discrepancy between the training distributio… ▽ More In many real-world machine learning (ML) applications (e.g. detecting broken bones in x-ray images, detecting species in camera traps), in practice models need to perform well on specific deployments (e.g. a specific hospital, a specific national park) rather than the domain broadly. However, deployments often have imbalanced, unique data distributions. Discrepancy between the training distribution and the deployment distribution can lead to suboptimal performance, highlighting the need to select deployment-specialized subsets from the available training data. We formalize dataset subset selection for specialization (DS3): given a training set drawn from a general distribution and a (potentially unlabeled) query set drawn from the desired deployment-specific distribution, the goal is to select a subset of the training data that optimizes deployment performance. We introduce DataS^3; the first dataset and benchmark designed specifically for the DS3 problem. DataS^3 encompasses diverse real-world application domains, each with a set of distinct deployments to specialize in. We conduct a comprehensive study evaluating algorithms from various families--including coresets, data filtering, and data curation--on DataS^3, and find that general-distribution methods consistently fail on deployment-specific tasks. Additionally, we demonstrate the existence of manually curated (deployment-specific) expert subsets that outperform training on all available data with accuracy gains up to 51.3 percent. Our benchmark highlights the critical role of tailored dataset curation in enhancing performance and training efficiency on deployment-specific distributions, which we posit will only become more important as global, public datasets become available across domains and ML models are deployed in the real world. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2504.06011 [pdf, other]

Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi

Authors: Monojit Choudhury, Shivam Chauhan, Rocktim Jyoti Das, Dhruv Sahnan, Xudong Han, Haonan Li, Aaryamonvikram Singh, Alok Anil Jadhav, Utkarsh Agarwal, Mukund Choudhary, Debopriyo Banerjee, Fajri Koto, Junaid Bhat, Awantika Shukla, Samujjwal Ghosh, Samta Kamboj, Onkar Pandit, Lalit Pradhan, Rahul Pal, Sunil Sahu, Soundar Doraiswamy, Parvez Mullah, Ali El Filali, Neha Sengupta, Gokul Ramakrishnan , et al. (5 additional authors not shown)

Abstract: Developing high-quality large language models (LLMs) for moderately resourced languages presents unique challenges in data availability, model adaptation, and evaluation. We introduce Llama-3-Nanda-10B-Chat, or Nanda for short, a state-of-the-art Hindi-centric instruction-tuned generative LLM, designed to push the boundaries of open-source Hindi language models. Built upon Llama-3-8B, Nanda incorp… ▽ More Developing high-quality large language models (LLMs) for moderately resourced languages presents unique challenges in data availability, model adaptation, and evaluation. We introduce Llama-3-Nanda-10B-Chat, or Nanda for short, a state-of-the-art Hindi-centric instruction-tuned generative LLM, designed to push the boundaries of open-source Hindi language models. Built upon Llama-3-8B, Nanda incorporates continuous pre-training with expanded transformer blocks, leveraging the Llama Pro methodology. A key challenge was the limited availability of high-quality Hindi text data; we addressed this through rigorous data curation, augmentation, and strategic bilingual training, balancing Hindi and English corpora to optimize cross-linguistic knowledge transfer. With 10 billion parameters, Nanda stands among the top-performing open-source Hindi and multilingual models of similar scale, demonstrating significant advantages over many existing models. We provide an in-depth discussion of training strategies, fine-tuning techniques, safety alignment, and evaluation metrics, demonstrating how these approaches enabled Nanda to achieve state-of-the-art results. By open-sourcing Nanda, we aim to advance research in Hindi LLMs and support a wide range of real-world applications across academia, industry, and public services. △ Less

Submitted 8 April, 2025; originally announced April 2025.

arXiv:2504.02823 [pdf, other]

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

Authors: Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari, Neha Gour, Abderaouf Behouch, Taimur Hassan, Syed Talal Wasim, Nabil Maalej, Muzammal Naseer, Juergen Gall, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi

Abstract: Advancements in Computer-Aided Screening (CAS) systems are essential for improving the detection of security threats in X-ray baggage scans. However, current datasets are limited in representing real-world, sophisticated threats and concealment tactics, and existing approaches are constrained by a closed-set paradigm with predefined labels. To address these challenges, we introduce STCray, the fir… ▽ More Advancements in Computer-Aided Screening (CAS) systems are essential for improving the detection of security threats in X-ray baggage scans. However, current datasets are limited in representing real-world, sophisticated threats and concealment tactics, and existing approaches are constrained by a closed-set paradigm with predefined labels. To address these challenges, we introduce STCray, the first multimodal X-ray baggage security dataset, comprising 46,642 image-caption paired scans across 21 threat categories, generated using an X-ray scanner for airport security. STCray is meticulously developed with our specialized protocol that ensures domain-aware, coherent captions, that lead to the multi-modal instruction following data in X-ray baggage security. This allows us to train a domain-aware visual AI assistant named STING-BEE that supports a range of vision-language tasks, including scene comprehension, referring threat localization, visual grounding, and visual question answering (VQA), establishing novel baselines for multi-modal learning in X-ray baggage security. Further, STING-BEE shows state-of-the-art generalization in cross-domain settings. Code, data, and models are available at https://divs1159.github.io/STING-BEE/. △ Less

Submitted 3 April, 2025; originally announced April 2025.

Comments: Accepted at CVPR 2025

arXiv:2503.19114 [pdf, other]

Understanding and Improving Information Preservation in Prompt Compression for LLMs

Authors: Weronika Łajewska, Momchil Hardalov, Laura Aina, Neha Anna John, Hang Su, Lluís Màrquez

Abstract: Recent advancements in large language models (LLMs) have enabled their successful application to a broad range of tasks. However, in information-intensive tasks, the prompt length can grow fast, leading to increased computational requirements, performance degradation, and induced biases from irrelevant or redundant information. Recently, various prompt compression techniques have been introduced t… ▽ More Recent advancements in large language models (LLMs) have enabled their successful application to a broad range of tasks. However, in information-intensive tasks, the prompt length can grow fast, leading to increased computational requirements, performance degradation, and induced biases from irrelevant or redundant information. Recently, various prompt compression techniques have been introduced to optimize the trade-off between reducing input length and retaining performance. We propose a holistic evaluation framework that allows for in-depth analysis of prompt compression methods. We focus on three key aspects, besides compression ratio: (i) downstream task performance, (ii) grounding in the input context, and (iii) information preservation. Through this framework, we investigate state-of-the-art soft and hard compression methods, showing that they struggle to preserve key details from the original prompt, limiting their performance on complex tasks. We demonstrate that modifying soft prompting methods to control better the granularity of the compressed information can significantly improve their effectiveness -- up to +23\% in downstream task performance, more than +8 BERTScore points in grounding, and 2.7x more entities preserved in compression. △ Less

Submitted 24 March, 2025; originally announced March 2025.

Comments: 21 pages, 6 figures, 23 tables

arXiv:2503.17114 [pdf, ps, other]

Range Avoidance in Boolean Circuits via Turan-type Bounds

Authors: Neha Kuntewar, Jayalal Sarma

Abstract: Given a circuit $C : \{0,1\}^n \to \{0,1\}^m$ from a circuit class $F$, with $m > n$, finding a $y \in \{0,1\}^m$ such that $\forall x \in \{0,1\}^n$, $C(x) \ne y$, is the range avoidance problem (denoted by $F$-$avoid$). Deterministic polynomial time algorithms (even with access to $NP$ oracles) solving this problem is known to imply explicit constructions of various pseudorandom objects like har… ▽ More Given a circuit $C : \{0,1\}^n \to \{0,1\}^m$ from a circuit class $F$, with $m > n$, finding a $y \in \{0,1\}^m$ such that $\forall x \in \{0,1\}^n$, $C(x) \ne y$, is the range avoidance problem (denoted by $F$-$avoid$). Deterministic polynomial time algorithms (even with access to $NP$ oracles) solving this problem is known to imply explicit constructions of various pseudorandom objects like hard Boolean functions, linear codes, PRGs etc. Deterministic polynomial time algorithms are known for $NC^0_2$-$avoid$ when $m > n$, and for $NC^0_3$-$avoid$ when $m \ge \frac{n^2}{\log n}$, where $NC^0_k$ is the class of circuits with bounded fan-in which have constant depth and the output depends on at most $k$ of the input bits. On the other hand, it is also known that $NC^0_3$-$avoid$ when $m = n+O\left(n^{2/3}\right)$ is at least as hard as explicit construction of rigid matrices. In this paper, we propose a new approach to solving range avoidance problem via hypergraphs. We formulate the problem in terms of Turan-type problems in hypergraphs of the following kind - for a fixed $k$-uniform hypergraph $H'$, what is the maximum number of edges that can exist in a $k$-uniform hypergraph $H$ which does not have a sub-hypergraph isomorphic to $H'$? We use our approach to show (using known Turan-type bounds) that there is a constant $c$ such that $mon$-$NC^0_3$-$avoid$ can be solved in deterministic polynomial time when $m > cn^2$. To improve the stretch constraint to linear, we show a new Turan-type theorem for a hypergraph structure (which we call the the loose $chi$-cycles) and use it to show that $mon$-$NC^0_3$-$avoid$ can be solved in deterministic polynomial time when $m > n$, thus improving the known bounds of $NC^0_3$-avoid for the case of monotone circuits. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: 31 pages, abstract shortened to fit in arxiv requirements

arXiv:2503.12370 [pdf, other]

Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat Logs

Authors: Rupak Sarkar, Neha Srikanth, Taylor Hudson, Rachel Rudinger, Claire Bonial, Philip Resnik

Abstract: While it is commonly accepted that maintaining common ground plays a role in conversational success, little prior research exists connecting conversational grounding to success in task-oriented conversations. We study failures of grounding in the Ubuntu IRC dataset, where participants use text-only communication to resolve technical issues. We find that disruptions in conversational flow often ste… ▽ More While it is commonly accepted that maintaining common ground plays a role in conversational success, little prior research exists connecting conversational grounding to success in task-oriented conversations. We study failures of grounding in the Ubuntu IRC dataset, where participants use text-only communication to resolve technical issues. We find that disruptions in conversational flow often stem from a misalignment in common ground, driven by a divergence in beliefs and assumptions held by participants. These disruptions, which we call conversational friction, significantly correlate with task success. We find that although LLMs can identify overt cases of conversational friction, they struggle with subtler and more context-dependent instances requiring pragmatic or domain-specific reasoning. △ Less

Submitted 16 March, 2025; originally announced March 2025.

Comments: 8 pages

arXiv:2503.01522 [pdf, ps, other]

Byzantine Distributed Function Computation

Authors: Hari Krishnan P. Anilkumar, Neha Sangwan, Varun Narayanan, Vinod M. Prabhakaran

Abstract: We study the distributed function computation problem with $k$ users of which at most $s$ may be controlled by an adversary and characterize the set of functions of the sources the decoder can reconstruct robustly in the following sense -- if the users behave honestly, the function is recovered with high probability (w.h.p.); if they behave adversarially, w.h.p, either one of the adversarial users… ▽ More We study the distributed function computation problem with $k$ users of which at most $s$ may be controlled by an adversary and characterize the set of functions of the sources the decoder can reconstruct robustly in the following sense -- if the users behave honestly, the function is recovered with high probability (w.h.p.); if they behave adversarially, w.h.p, either one of the adversarial users will be identified or the function is recovered with vanishingly small distortion. △ Less

Submitted 10 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

arXiv:2503.01493 [pdf, other]

Llama-3.1-Sherkala-8B-Chat: An Open Large Language Model for Kazakh

Authors: Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly, Yuxia Wang, Zhuohan Xie, Rahul Pal, Daniil Orel, Parvez Mullah, Diana Turmakhan, Maiya Goloburda, Mohammed Kamran, Samujjwal Ghosh, Bokang Jia, Jonibek Mansurov, Mukhammed Togmanov, Debopriyo Banerjee, Nurkhan Laiyk, Akhmed Sakip, Xudong Han, Ekaterina Kochmar, Alham Fikri Aji, Aaryamonvikram Singh, Alok Anil Jadhav, Satheesh Katipomu, Samta Kamboj , et al. (10 additional authors not shown)

Abstract: Llama-3.1-Sherkala-8B-Chat, or Sherkala-Chat (8B) for short, is a state-of-the-art instruction-tuned open generative large language model (LLM) designed for Kazakh. Sherkala-Chat (8B) aims to enhance the inclusivity of LLM advancements for Kazakh speakers. Adapted from the LLaMA-3.1-8B model, Sherkala-Chat (8B) is trained on 45.3B tokens across Kazakh, English, Russian, and Turkish. With 8 billion… ▽ More Llama-3.1-Sherkala-8B-Chat, or Sherkala-Chat (8B) for short, is a state-of-the-art instruction-tuned open generative large language model (LLM) designed for Kazakh. Sherkala-Chat (8B) aims to enhance the inclusivity of LLM advancements for Kazakh speakers. Adapted from the LLaMA-3.1-8B model, Sherkala-Chat (8B) is trained on 45.3B tokens across Kazakh, English, Russian, and Turkish. With 8 billion parameters, it demonstrates strong knowledge and reasoning abilities in Kazakh, significantly outperforming existing open Kazakh and multilingual models of similar scale while achieving competitive performance in English. We release Sherkala-Chat (8B) as an open-weight instruction-tuned model and provide a detailed overview of its training, fine-tuning, safety alignment, and evaluation, aiming to advance research and support diverse real-world applications. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Comments: Technical Report

arXiv:2502.19528 [pdf, other]

Improving Simulation-Based Origin-Destination Demand Calibration Using Sample Segment Counts Data

Authors: Arwa Alanqary, Chao Zhang, Yechen Li, Neha Arora, Carolina Osorio

Abstract: This paper introduces a novel approach to demand estimation that utilizes partial observations of segment-level track counts. Building on established simulation-based demand estimation methods, we present a modified formulation that integrates sample track counts as a regularization term. This approach effectively addresses the underdetermination challenge in demand estimation, moving beyond the c… ▽ More This paper introduces a novel approach to demand estimation that utilizes partial observations of segment-level track counts. Building on established simulation-based demand estimation methods, we present a modified formulation that integrates sample track counts as a regularization term. This approach effectively addresses the underdetermination challenge in demand estimation, moving beyond the conventional reliance on a prior OD matrix. The proposed formulation aims to preserve the distribution of the observed track counts while optimizing the demand to align with observed path-level travel times. We tested this approach on Seattle's highway network with various congestion levels. Our findings reveal significant enhancements in the solution quality, particularly in accurately recovering ground truth demand patterns at both the OD and segment levels. △ Less

Submitted 26 February, 2025; originally announced February 2025.

Comments: Published in the 12th Triennial Symposium on Transportation Analysis conference (TRISTAN XII)

arXiv:2502.16008 [pdf, other]

Exact Recovery of Sparse Binary Vectors from Generalized Linear Measurements

Authors: Arya Mazumdar, Neha Sangwan

Abstract: We consider the problem of exact recovery of a $k$-sparse binary vector from generalized linear measurements (such as logistic regression). We analyze the linear estimation algorithm (Plan, Vershynin, Yudovina, 2017), and also show information theoretic lower bounds on the number of required measurements. As a consequence of our results, for noisy one bit quantized linear measurements (… ▽ More We consider the problem of exact recovery of a $k$-sparse binary vector from generalized linear measurements (such as logistic regression). We analyze the linear estimation algorithm (Plan, Vershynin, Yudovina, 2017), and also show information theoretic lower bounds on the number of required measurements. As a consequence of our results, for noisy one bit quantized linear measurements ($\mathsf{1bCSbinary}$), we obtain a sample complexity of $O((k+σ^2)\log{n})$, where $σ^2$ is the noise variance. This is shown to be optimal due to the information theoretic lower bound. We also obtain tight sample complexity characterization for logistic regression. Since $\mathsf{1bCSbinary}$ is a strictly harder problem than noisy linear measurements ($\mathsf{SparseLinearReg}$) because of added quantization, the same sample complexity is achievable for $\mathsf{SparseLinearReg}$. While this sample complexity can be obtained via the popular lasso algorithm, linear estimation is computationally more efficient. Our lower bound holds for any set of measurements for $\mathsf{SparseLinearReg}$, (similar bound was known for Gaussian measurement matrices) and is closely matched by the maximum-likelihood upper bound. For $\mathsf{SparseLinearReg}$, it was conjectured in Gamarnik and Zadik, 2017 that there is a statistical-computational gap and the number of measurements should be at least $(2k+σ^2)\log{n}$ for efficient algorithms to exist. It is worth noting that our results imply that there is no such statistical-computational gap for $\mathsf{1bCSbinary}$ and logistic regression. △ Less

Submitted 21 February, 2025; originally announced February 2025.

arXiv:2502.15978 [pdf, other]

"Who Has the Time?": Understanding Receptivity to Health Chatbots among Underserved Women in India

Authors: Manvi S, Roshini Deva, Neha Madhiwalla, Azra Ismail

Abstract: Access to health information and services among women continues to be a major challenge in many communities globally. In recent years, there has been a growing interest in the potential of chatbots to address this information and access gap. We conducted interviews and focus group discussions with underserved women in urban India to understand their receptivity towards the use of chatbots for mate… ▽ More Access to health information and services among women continues to be a major challenge in many communities globally. In recent years, there has been a growing interest in the potential of chatbots to address this information and access gap. We conducted interviews and focus group discussions with underserved women in urban India to understand their receptivity towards the use of chatbots for maternal and child health, as well as barriers to their adoption. Our findings uncover gaps in digital access and literacies, and perceived conflict with various responsibilities that women are burdened with, which shape their interactions with digital technology. Our paper offers insights into the design of chatbots for community health that can meet the lived realities of women in underserved settings. △ Less

Submitted 21 February, 2025; originally announced February 2025.

Comments: This is a preprint of the paper accepted at the International Conference on Gender and Technology 2025. The final version will be available in the IEEE Xplore Digital Library

arXiv:2502.14270 [pdf, other]

Predicting Fetal Birthweight from High Dimensional Data using Advanced Machine Learning

Authors: Nachiket Kapure, Harsh Joshi, Rajeshwari Mistri, Parul Kumari, Manasi Mali, Seema Purohit, Neha Sharma, Mrityunjoy Panday, Chittaranjan S. Yajnik

Abstract: Birth weight serves as a fundamental indicator of neonatal health, closely linked to both early medical interventions and long-term developmental risks. Traditional predictive models, often constrained by limited feature selection and incomplete datasets, struggle to achieve overlooking complex maternal and fetal interactions in diverse clinical settings. This research explores machine learning to… ▽ More Birth weight serves as a fundamental indicator of neonatal health, closely linked to both early medical interventions and long-term developmental risks. Traditional predictive models, often constrained by limited feature selection and incomplete datasets, struggle to achieve overlooking complex maternal and fetal interactions in diverse clinical settings. This research explores machine learning to address these limitations, utilizing a structured methodology that integrates advanced imputation strategies, supervised feature selection techniques, and predictive modeling. Given the constraints of the dataset, the research strengthens the role of data preprocessing in improving the model performance. Among the various methodologies explored, tree-based feature selection methods demonstrated superior capability in identifying the most relevant predictors, while ensemble-based regression models proved highly effective in capturing non-linear relationships and complex maternal-fetal interactions within the data. Beyond model performance, the study highlights the clinical significance of key physiological determinants, offering insights into maternal and fetal health factors that influence birth weight, offering insights that extend over statistical modeling. By bridging computational intelligence with perinatal research, this work underscores the transformative role of machine learning in enhancing predictive accuracy, refining risk assessment and informing data-driven decision-making in maternal and neonatal care. Keywords: Birth weight prediction, maternal-fetal health, MICE, BART, Gradient Boosting, neonatal outcomes, Clinipredictive. △ Less

Submitted 8 April, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

arXiv:2502.08080 [pdf, other]

NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals

Authors: Neha Srikanth, Rachel Rudinger

Abstract: Decomposition of text into atomic propositions is a flexible framework allowing for the closer inspection of input and output text. We use atomic decomposition of hypotheses in two natural language reasoning tasks, traditional NLI and defeasible NLI, to form atomic sub-problems, or granular inferences that models must weigh when solving the overall problem. These atomic sub-problems serve as a too… ▽ More Decomposition of text into atomic propositions is a flexible framework allowing for the closer inspection of input and output text. We use atomic decomposition of hypotheses in two natural language reasoning tasks, traditional NLI and defeasible NLI, to form atomic sub-problems, or granular inferences that models must weigh when solving the overall problem. These atomic sub-problems serve as a tool to further understand the structure of both NLI and defeasible reasoning, probe a model's consistency and understanding of different inferences, and measure the diversity of examples in benchmark datasets. Our results indicate that LLMs still struggle with logical consistency on atomic NLI and defeasible NLI sub-problems. Lastly, we identify critical atomic sub-problems of defeasible NLI examples, or those that most contribute to the overall label, and propose a method to measure the inferential consistency of a model, a metric designed to capture the degree to which a model makes consistently correct or incorrect predictions about the same fact under different contexts. △ Less

Submitted 7 March, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

Comments: Accepted to NAACL 2025

arXiv:2502.05227 [pdf, other]

Robotouille: An Asynchronous Planning Benchmark for LLM Agents

Authors: Gonzalo Gonzalez-Pumariega, Leong Su Yean, Neha Sunkara, Sanjiban Choudhury

Abstract: Effective asynchronous planning, or the ability to efficiently reason and plan over states and actions that must happen in parallel or sequentially, is essential for agents that must account for time delays, reason over diverse long-horizon tasks, and collaborate with other agents. While large language model (LLM) agents show promise in high-level task planning, current benchmarks focus primarily… ▽ More Effective asynchronous planning, or the ability to efficiently reason and plan over states and actions that must happen in parallel or sequentially, is essential for agents that must account for time delays, reason over diverse long-horizon tasks, and collaborate with other agents. While large language model (LLM) agents show promise in high-level task planning, current benchmarks focus primarily on short-horizon tasks and do not evaluate such asynchronous planning capabilities. We introduce Robotouille, a challenging benchmark environment designed to test LLM agents' ability to handle long-horizon asynchronous scenarios. Our synchronous and asynchronous datasets capture increasingly complex planning challenges that go beyond existing benchmarks, requiring agents to manage overlapping tasks and interruptions. Our results show that ReAct (gpt4-o) achieves 47% on synchronous tasks but only 11% on asynchronous tasks, highlighting significant room for improvement. We further analyze failure modes, demonstrating the need for LLM agents to better incorporate long-horizon feedback and self-audit their reasoning during task execution. Code is available at https://github.com/portal-cornell/robotouille. △ Less

Submitted 6 February, 2025; originally announced February 2025.

Comments: 11 pages (not including references or appendix); 41 figures (7 main paper, 34 appendix); (v1) preprint

arXiv:2501.06306 [pdf, other]

On How Traffic Signals Impact the Fundamental Diagrams of Urban Roads

Authors: Chao Zhang, Yechen Li, Neha Arora, Carolina Osorio

Abstract: Being widely adopted by the transportation and planning practitioners, the fundamental diagram (FD) is the primary tool used to relate the key macroscopic traffic variables of speed, flow, and density. We empirically analyze the relation between vehicular space-mean speeds and flows given different signal settings and postulate a parsimonious parametric function form of the traditional FD where it… ▽ More Being widely adopted by the transportation and planning practitioners, the fundamental diagram (FD) is the primary tool used to relate the key macroscopic traffic variables of speed, flow, and density. We empirically analyze the relation between vehicular space-mean speeds and flows given different signal settings and postulate a parsimonious parametric function form of the traditional FD where its function parameters are explicitly modeled as a function of the signal plan factors. We validate the proposed formulation using data from signalized urban road segments in Salt Lake City, Utah, USA. The proposed formulation builds our understanding of how changes to signal settings impact the FDs, and more generally the congestion patterns, of signalized urban segments. △ Less

Submitted 10 January, 2025; originally announced January 2025.

Comments: Published in the 4th Symposium on Management of Future Motorway and Urban Traffic Systems (MFTS)

arXiv:2501.06126 [pdf, other]

Merging Feed-Forward Sublayers for Compressed Transformers

Authors: Neha Verma, Kenton Murray, Kevin Duh

Abstract: With the rise and ubiquity of larger deep learning models, the need for high-quality compression techniques is growing in order to deploy these models widely. The sheer parameter count of these models makes it difficult to fit them into the memory constraints of different hardware. In this work, we present a novel approach to model compression by merging similar parameter groups within a model, ra… ▽ More With the rise and ubiquity of larger deep learning models, the need for high-quality compression techniques is growing in order to deploy these models widely. The sheer parameter count of these models makes it difficult to fit them into the memory constraints of different hardware. In this work, we present a novel approach to model compression by merging similar parameter groups within a model, rather than pruning away less important parameters. Specifically, we select, align, and merge separate feed-forward sublayers in Transformer models, and test our method on language modeling, image classification, and machine translation. With our method, we demonstrate performance comparable to the original models while combining more than a third of model feed-forward sublayers, and demonstrate improved performance over a strong layer-pruning baseline. For instance, we can remove over 21% of total parameters from a Vision Transformer, while maintaining 99% of its original performance. Additionally, we observe that some groups of feed-forward sublayers exhibit high activation similarity, which may help explain their surprising mergeability. △ Less

Submitted 28 March, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

arXiv:2501.04783 [pdf, other]

Traffic Simulations: Multi-City Calibration of Metropolitan Highway Networks

Authors: Chao Zhang, Yechen Li, Neha Arora, Damien Pierce, Carolina Osorio

Abstract: This paper proposes an approach to perform travel demand calibration for high-resolution stochastic traffic simulators. It employs abundant travel times at the path-level, departing from the standard practice of resorting to scarce segment-level sensor counts. The proposed approach is shown to tackle high-dimensional instances in a sample-efficient way. For the first time, case studies on 6 metrop… ▽ More This paper proposes an approach to perform travel demand calibration for high-resolution stochastic traffic simulators. It employs abundant travel times at the path-level, departing from the standard practice of resorting to scarce segment-level sensor counts. The proposed approach is shown to tackle high-dimensional instances in a sample-efficient way. For the first time, case studies on 6 metropolitan highway networks are carried out, considering a total of 54 calibration scenarios. This is the first work to show the ability of a calibration algorithm to systematically scale across networks. Compared to the state-of-the-art simultaneous perturbation stochastic approximation (SPSA) algorithm, the proposed approach enhances fit to field data by an average 43.5% with a maximum improvement of 80.0%, and does so within fewer simulation calls. △ Less

Submitted 8 January, 2025; originally announced January 2025.

Comments: Published on the 27th IEEE International Conference on Intelligent Transportation Systems (ITSC) (2024)

arXiv:2412.17899 [pdf, other]

A mixing time bound for Gibbs sampling from log-smooth log-concave distributions

Authors: Neha S. Wadia

Abstract: The Gibbs sampler, also known as the coordinate hit-and-run algorithm, is a Markov chain that is widely used to draw samples from probability distributions in arbitrary dimensions. At each iteration of the algorithm, a randomly selected coordinate is resampled from the distribution that results from conditioning on all the other coordinates. We study the behavior of the Gibbs sampler on the class… ▽ More The Gibbs sampler, also known as the coordinate hit-and-run algorithm, is a Markov chain that is widely used to draw samples from probability distributions in arbitrary dimensions. At each iteration of the algorithm, a randomly selected coordinate is resampled from the distribution that results from conditioning on all the other coordinates. We study the behavior of the Gibbs sampler on the class of log-smooth and strongly log-concave target distributions supported on $\mathbb{R}^n$. Assuming the initial distribution is $M$-warm with respect to the target, we show that the Gibbs sampler requires at most $O^{\star}\left(κ^2 n^{7.5}\left(\max\left\{1,\sqrt{\frac{1}{n}\log \frac{2M}γ}\right\}\right)^2\right)$ steps to produce a sample with error no more than $γ$ in total variation distance from a distribution with condition number $κ$. △ Less

Submitted 23 December, 2024; originally announced December 2024.

Comments: 22 pages, 4 figures

arXiv:2412.14089 [pdf, other]

doi 10.1145/3589132.3625566

On the Use of Abundant Road Speed Data for Travel Demand Calibration of Urban Traffic Simulators

Authors: Suyash Vishnoi, Akhil Shetty, Iveel Tsogsuren, Neha Arora, Carolina Osorio

Abstract: This work develops a compute-efficient algorithm to tackle a fundamental problem in transportation: that of urban travel demand estimation. It focuses on the calibration of origin-destination travel demand input parameters for high-resolution traffic simulation models. It considers the use of abundant traffic road speed data. The travel demand calibration problem is formulated as a continuous, hig… ▽ More This work develops a compute-efficient algorithm to tackle a fundamental problem in transportation: that of urban travel demand estimation. It focuses on the calibration of origin-destination travel demand input parameters for high-resolution traffic simulation models. It considers the use of abundant traffic road speed data. The travel demand calibration problem is formulated as a continuous, high-dimensional, simulation-based optimization (SO) problem with bound constraints. There is a lack of compute efficient algorithms to tackle this problem. We propose the use of an SO algorithm that relies on an efficient, analytical, differentiable, physics-based traffic model, known as a metamodel or surrogate model. We formulate a metamodel that enables the use of road speed data. Tests are performed on a Salt Lake City network. We study how the amount of data, as well as the congestion levels, impact both in-sample and out-of-sample performance. The proposed method outperforms the benchmark for both in-sample and out-of-sample performance by 84.4% and 72.2% in terms of speeds and counts, respectively. Most importantly, the proposed method yields the highest compute efficiency, identifying solutions with good performance within few simulation function evaluations (i.e., with small samples). △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: 4 pages

Journal ref: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, pp. 1-4. 2023

arXiv:2412.09411 [pdf, other]

Resilience for Regular Path Queries: Towards a Complexity Classification

Authors: Antoine Amarilli, Wolfgang Gatterbauer, Neha Makhija, Mikaël Monet

Abstract: The resilience problem for a query and an input set or bag database is to compute the minimum number of facts to remove from the database to make the query false. In this paper, we study how to compute the resilience of Regular Path Queries (RPQs) over graph databases. Our goal is to characterize the regular languages $L$ for which it is tractable to compute the resilience of the existentially-qua… ▽ More The resilience problem for a query and an input set or bag database is to compute the minimum number of facts to remove from the database to make the query false. In this paper, we study how to compute the resilience of Regular Path Queries (RPQs) over graph databases. Our goal is to characterize the regular languages $L$ for which it is tractable to compute the resilience of the existentially-quantified RPQ built from $L$. We show that computing the resilience in this sense is tractable (even in combined complexity) for all RPQs defined from so-called local languages. By contrast, we show hardness in data complexity for RPQs defined from the following language classes (after reducing the languages to eliminate redundant words): all finite languages featuring a word containing a repeated letter, and all languages featuring a specific kind of counterexample to being local (which we call four-legged languages). The latter include in particular all languages that are not star-free. Our results also imply hardness for all non-local languages with a so-called neutral letter. We last highlight some remaining obstacles towards a full dichotomy. △ Less

Submitted 23 March, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

Comments: 48 pages, 17 figures. Minor updates relative to version 1. This version includes the appendices with all proofs, and all reviewer feedback: it is identical to the PODS'25 publication up to minor formatting differences

arXiv:2412.09230 [pdf, other]

Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering

Authors: Sai Bhargav Rongali, Mohamad Hassan N C, Ankit Jha, Neha Bhargava, Saurabh Prasad, Biplab Banerjee

Abstract: This paper tackles the intricate challenge of video question-answering (VideoQA). Despite notable progress, current methods fall short of effectively integrating questions with video frames and semantic object-level abstractions to create question-aware video representations. We introduce Local-Global Question Aware Video Embedding (LGQAVE), which incorporates three major innovations to integrate… ▽ More This paper tackles the intricate challenge of video question-answering (VideoQA). Despite notable progress, current methods fall short of effectively integrating questions with video frames and semantic object-level abstractions to create question-aware video representations. We introduce Local-Global Question Aware Video Embedding (LGQAVE), which incorporates three major innovations to integrate multi-modal knowledge better and emphasize semantic visual concepts relevant to specific questions. LGQAVE moves beyond traditional ad-hoc frame sampling by utilizing a cross-attention mechanism that precisely identifies the most relevant frames concerning the questions. It captures the dynamics of objects within these frames using distinct graphs, grounding them in question semantics with the miniGPT model. These graphs are processed by a question-aware dynamic graph transformer (Q-DGT), which refines the outputs to develop nuanced global and local video representations. An additional cross-attention module integrates these local and global embeddings to generate the final video embeddings, which a language model uses to generate answers. Extensive evaluations across multiple benchmarks demonstrate that LGQAVE significantly outperforms existing models in delivering accurate multi-choice and open-ended answers. △ Less

Submitted 12 December, 2024; originally announced December 2024.

Journal ref: WACV2025

arXiv:2412.05724 [pdf, other]

A Tiered GAN Approach for Monet-Style Image Generation

Authors: FNU Neha, Deepshikha Bhati, Deepak Kumar Shukla, Md Amiruzzaman

Abstract: Generative Adversarial Networks (GANs) have proven to be a powerful tool in generating artistic images, capable of mimicking the styles of renowned painters, such as Claude Monet. This paper introduces a tiered GAN model to progressively refine image quality through a multi-stage process, enhancing the generated images at each step. The model transforms random noise into detailed artistic represen… ▽ More Generative Adversarial Networks (GANs) have proven to be a powerful tool in generating artistic images, capable of mimicking the styles of renowned painters, such as Claude Monet. This paper introduces a tiered GAN model to progressively refine image quality through a multi-stage process, enhancing the generated images at each step. The model transforms random noise into detailed artistic representations, addressing common challenges such as instability in training, mode collapse, and output quality. This approach combines downsampling and convolutional techniques, enabling the generation of high-quality Monet-style artwork while optimizing computational efficiency. Experimental results demonstrate the architecture's ability to produce foundational artistic structures, though further refinements are necessary for achieving higher levels of realism and fidelity to Monet's style. Future work focuses on improving training methodologies and model complexity to bridge the gap between generated and true artistic images. Additionally, the limitations of traditional GANs in artistic generation are analyzed, and strategies to overcome these shortcomings are proposed. △ Less

Submitted 7 December, 2024; originally announced December 2024.

arXiv:2412.05686 [pdf, other]

Neural network interpretability with layer-wise relevance propagation: novel techniques for neuron selection and visualization

Authors: Deepshikha Bhati, Fnu Neha, Md Amiruzzaman, Angela Guercio, Deepak Kumar Shukla, Ben Ward

Abstract: Interpreting complex neural networks is crucial for understanding their decision-making processes, particularly in applications where transparency and accountability are essential. This proposed method addresses this need by focusing on layer-wise Relevance Propagation (LRP), a technique used in explainable artificial intelligence (XAI) to attribute neural network outputs to input features through… ▽ More Interpreting complex neural networks is crucial for understanding their decision-making processes, particularly in applications where transparency and accountability are essential. This proposed method addresses this need by focusing on layer-wise Relevance Propagation (LRP), a technique used in explainable artificial intelligence (XAI) to attribute neural network outputs to input features through backpropagated relevance scores. Existing LRP methods often struggle with precision in evaluating individual neuron contributions. To overcome this limitation, we present a novel approach that improves the parsing of selected neurons during LRP backward propagation, using the Visual Geometry Group 16 (VGG16) architecture as a case study. Our method creates neural network graphs to highlight critical paths and visualizes these paths with heatmaps, optimizing neuron selection through accuracy metrics like Mean Squared Error (MSE) and Symmetric Mean Absolute Percentage Error (SMAPE). Additionally, we utilize a deconvolutional visualization technique to reconstruct feature maps, offering a comprehensive view of the network's inner workings. Extensive experiments demonstrate that our approach enhances interpretability and supports the development of more transparent artificial intelligence (AI) systems for computer vision applications. This advancement has the potential to improve the trustworthiness of AI models in real-world machine vision applications, thereby increasing their reliability and effectiveness. △ Less

Submitted 7 December, 2024; originally announced December 2024.

arXiv:2412.05252 [pdf, other]

From classical techniques to convolution-based models: A review of object detection algorithms

Authors: Fnu Neha, Deepshikha Bhati, Deepak Kumar Shukla, Md Amiruzzaman

Abstract: Object detection is a fundamental task in computer vision and image understanding, with the goal of identifying and localizing objects of interest within an image while assigning them corresponding class labels. Traditional methods, which relied on handcrafted features and shallow models, struggled with complex visual data and showed limited performance. These methods combined low-level features w… ▽ More Object detection is a fundamental task in computer vision and image understanding, with the goal of identifying and localizing objects of interest within an image while assigning them corresponding class labels. Traditional methods, which relied on handcrafted features and shallow models, struggled with complex visual data and showed limited performance. These methods combined low-level features with contextual information and lacked the ability to capture high-level semantics. Deep learning, especially Convolutional Neural Networks (CNNs), addressed these limitations by automatically learning rich, hierarchical features directly from data. These features include both semantic and high-level representations essential for accurate object detection. This paper reviews object detection frameworks, starting with classical computer vision methods. We categorize object detection approaches into two groups: (1) classical computer vision techniques and (2) CNN-based detectors. We compare major CNN models, discussing their strengths and limitations. In conclusion, this review highlights the significant advancements in object detection through deep learning and identifies key areas for further research to improve performance. △ Less

Submitted 6 December, 2024; originally announced December 2024.

arXiv:2412.03933 [pdf, other]

Exploring AI Text Generation, Retrieval-Augmented Generation, and Detection Technologies: a Comprehensive Overview

Authors: Fnu Neha, Deepshikha Bhati, Deepak Kumar Shukla, Angela Guercio, Ben Ward

Abstract: The rapid development of Artificial Intelligence (AI) has led to the creation of powerful text generation models, such as large language models (LLMs), which are widely used for diverse applications. However, concerns surrounding AI-generated content, including issues of originality, bias, misinformation, and accountability, have become increasingly prominent. This paper offers a comprehensive ove… ▽ More The rapid development of Artificial Intelligence (AI) has led to the creation of powerful text generation models, such as large language models (LLMs), which are widely used for diverse applications. However, concerns surrounding AI-generated content, including issues of originality, bias, misinformation, and accountability, have become increasingly prominent. This paper offers a comprehensive overview of AI text generators (AITGs), focusing on their evolution, capabilities, and ethical implications. This paper also introduces Retrieval-Augmented Generation (RAG), a recent approach that improves the contextual relevance and accuracy of text generation by integrating dynamic information retrieval. RAG addresses key limitations of traditional models, including their reliance on static knowledge and potential inaccuracies in handling real-world data. Additionally, the paper reviews detection tools that help differentiate AI-generated text from human-written content and discusses the ethical challenges these technologies pose. The paper explores future directions for improving detection accuracy, supporting ethical AI development, and increasing accessibility. The paper contributes to a more responsible and reliable use of AI in content creation through these discussions. △ Less

Submitted 5 December, 2024; originally announced December 2024.

arXiv:2412.02242 [pdf, other]

U-Net in Medical Image Segmentation: A Review of Its Applications Across Modalities

Authors: Fnu Neha, Deepshikha Bhati, Deepak Kumar Shukla, Sonavi Makarand Dalvi, Nikolaos Mantzou, Safa Shubbar

Abstract: Medical imaging is essential in healthcare to provide key insights into patient anatomy and pathology, aiding in diagnosis and treatment. Non-invasive techniques such as X-ray, Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and Ultrasound (US), capture detailed images of organs, tissues, and abnormalities. Effective analysis of these images requires precise segmentation to delineate r… ▽ More Medical imaging is essential in healthcare to provide key insights into patient anatomy and pathology, aiding in diagnosis and treatment. Non-invasive techniques such as X-ray, Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and Ultrasound (US), capture detailed images of organs, tissues, and abnormalities. Effective analysis of these images requires precise segmentation to delineate regions of interest (ROI), such as organs or lesions. Traditional segmentation methods, relying on manual feature-extraction, are labor-intensive and vary across experts. Recent advancements in Artificial Intelligence (AI) and Deep Learning (DL), particularly convolutional models such as U-Net and its variants (U-Net++ and U-Net 3+), have transformed medical image segmentation (MIS) by automating the process and enhancing accuracy. These models enable efficient, precise pixel-wise classification across various imaging modalities, overcoming the limitations of manual segmentation. This review explores various medical imaging techniques, examines the U-Net architectures and their adaptations, and discusses their application across different modalities. It also identifies common challenges in MIS and proposes potential solutions. △ Less

Submitted 3 December, 2024; originally announced December 2024.

arXiv:2412.02166 [pdf, other]

Analyzing the Impact of AI Tools on Student Study Habits and Academic Performance

Authors: Ben Ward, Deepshikha Bhati, Fnu Neha, Angela Guercio

Abstract: This study explores the effectiveness of AI tools in enhancing student learning, specifically in improving study habits, time management, and feedback mechanisms. The research focuses on how AI tools can support personalized learning, adaptive test adjustments, and provide real-time classroom analysis. Student feedback revealed strong support for these features, and the study found a significant r… ▽ More This study explores the effectiveness of AI tools in enhancing student learning, specifically in improving study habits, time management, and feedback mechanisms. The research focuses on how AI tools can support personalized learning, adaptive test adjustments, and provide real-time classroom analysis. Student feedback revealed strong support for these features, and the study found a significant reduction in study hours alongside an increase in GPA, suggesting positive academic outcomes. Despite these benefits, challenges such as over-reliance on AI and difficulties in integrating AI with traditional teaching methods were also identified, emphasizing the need for AI tools to complement conventional educational strategies rather than replace them. Data were collected through a survey with a Likert scale and follow-up interviews, providing both quantitative and qualitative insights. The analysis involved descriptive statistics to summarize demographic data, AI usage patterns, and perceived effectiveness, as well as inferential statistics (T-tests, ANOVA) to examine the impact of demographic factors on AI adoption. Regression analysis identified predictors of AI adoption, and qualitative responses were thematically analyzed to understand students' perspectives on the future of AI in education. This mixed-methods approach provided a comprehensive view of AI's role in education and highlighted the importance of privacy, transparency, and continuous refinement of AI features to maximize their educational benefits. △ Less

Submitted 2 December, 2024; originally announced December 2024.

arXiv:2411.17603 [pdf, ps, other]

Is Integer Linear Programming All You Need for Deletion Propagation? A Unified and Practical Approach for Generalized Deletion Propagation

Authors: Neha Makhija, Wolfgang Gatterbauer

Abstract: Deletion Propagation (DP) refers to a family of database problems rooted in the classical view-update problem: how to propagate intended deletions in a view (query output) back to the source database while satisfying constraints and minimizing side effects. Although studied for over 40 years, DP variants, their complexities, and practical algorithms have been typically explored in isolation. Thi… ▽ More Deletion Propagation (DP) refers to a family of database problems rooted in the classical view-update problem: how to propagate intended deletions in a view (query output) back to the source database while satisfying constraints and minimizing side effects. Although studied for over 40 years, DP variants, their complexities, and practical algorithms have been typically explored in isolation. This work presents a unified and generalized framework for DP with several key benefits: (1) It unifies and generalizes all previously known DP variants, effectively subsuming them within a broader class of problems, including new, well-motivated variants. (2) It comes with a practical and general-purpose algorithm that is ``coarse-grained instance-optimal'': it runs in PTIME for all known PTIME cases and can automatically exploit structural regularities in the data, i.e. it does not rely on hints about such regularities as part of the input. (3) It is complete: our framework handles all known DP variants in all settings (including those involving self-joins, unions, and bag semantics), and allows us to provide new complexity results. (4) It is easy to implement and, in many cases, outperforms prior variant-specific solutions, sometimes by orders of magnitude. We provide the first experimental results for several DP variants previously studied only in theory. △ Less

Submitted 16 June, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

Comments: 19 pages, 12 figures

arXiv:2411.14437 [pdf]

Transforming Business with Generative AI: Research, Innovation, Market Deployment and Future Shifts in Business Models

Authors: Narotam Singh, Vaibhav Chaudhary, Nimisha Singh, Neha Soni, Amita Kapoor

Abstract: This paper explores the transformative impact of Generative AI (GenAI) on the business landscape, examining its role in reshaping traditional business models, intensifying market competition, and fostering innovation. By applying the principles of Neo-Schumpeterian economics, the research analyses how GenAI is driving a new wave of "creative destruction," leading to the emergence of novel business… ▽ More This paper explores the transformative impact of Generative AI (GenAI) on the business landscape, examining its role in reshaping traditional business models, intensifying market competition, and fostering innovation. By applying the principles of Neo-Schumpeterian economics, the research analyses how GenAI is driving a new wave of "creative destruction," leading to the emergence of novel business paradigms and value propositions. The findings reveal that GenAI enhances operational efficiency, facilitates product and service innovation, and creates new revenue streams, positioning it as a powerful catalyst for substantial shifts in business structures and strategies. However, the deployment of GenAI also presents significant challenges, including ethical concerns, regulatory demands, and the risk of job displacement. By addressing the multifarious nature of GenAI, this paper provides valuable insights for business leaders, policymakers, and researchers, guiding them towards a balanced and responsible integration of this transformative technology. Ultimately, GenAI is not merely a technological advancement but a driver of profound change, heralding a future where creativity, efficiency, and growth are redefined. △ Less

Submitted 4 November, 2024; originally announced November 2024.

Comments: 30 pages, 12 figures, original submission

MSC Class: 68T05 (Primary); 68T50; 91B84; 91B69; 62H30 (Secondary) ACM Class: I.2.7; I.2.6; K.4.1; K.6.1; J.4; H.1.2

arXiv:2411.10548 [pdf, ps, other]

BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

Authors: Peter St. John, Dejun Lin, Polina Binder, Malcolm Greaves, Vega Shah, John St. John, Adrian Lange, Patrick Hsu, Rajesh Illango, Arvind Ramanathan, Anima Anandkumar, David H Brookes, Akosua Busia, Abhishaike Mahajan, Stephen Malina, Neha Prasad, Sam Sinai, Lindsay Edwards, Thomas Gaudelet, Cristian Regep, Martin Steinegger, Burkhard Rost, Alexander Brace, Kyle Hippe, Luca Naef , et al. (68 additional authors not shown)

Abstract: Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational bio… ▽ More Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use. △ Less

Submitted 12 June, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

arXiv:2410.20231 [pdf, other]

CAVE-Net: Classifying Abnormalities in Video Capsule Endoscopy

Authors: Ishita Harish, Saurav Mishra, Neha Bhadoria, Rithik Kumar, Madhav Arora, Syed Rameem Zahra, Ankur Gupta

Abstract: Accurate classification of medical images is critical for detecting abnormalities in the gastrointestinal tract, a domain where misclassification can significantly impact patient outcomes. We propose an ensemble-based approach to improve diagnostic accuracy in analyzing complex image datasets. Using a Convolutional Block Attention Module along with a Deep Neural Network, we leverage the unique fea… ▽ More Accurate classification of medical images is critical for detecting abnormalities in the gastrointestinal tract, a domain where misclassification can significantly impact patient outcomes. We propose an ensemble-based approach to improve diagnostic accuracy in analyzing complex image datasets. Using a Convolutional Block Attention Module along with a Deep Neural Network, we leverage the unique feature extraction capabilities of each model to enhance the overall accuracy. The classification models, such as Random Forest, XGBoost, Support Vector Machine and K-Nearest Neighbors are introduced to further diversify the predictive power of proposed ensemble. By using these methods, the proposed framework, CAVE-Net, provides robust feature discrimination and improved classification results. Experimental evaluations demonstrate that the CAVE-Net achieves high accuracy and robustness across challenging and imbalanced classes, showing significant promise for broader applications in computer vision tasks. △ Less

Submitted 30 December, 2024; v1 submitted 26 October, 2024; originally announced October 2024.

arXiv:2410.19656 [pdf, other]

APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs

Authors: Huaxiaoyue Wang, Nathaniel Chin, Gonzalo Gonzalez-Pumariega, Xiangwan Sun, Neha Sunkara, Maximus Adrian Pace, Jeannette Bohg, Sanjiban Choudhury

Abstract: Home robots performing personalized tasks must adeptly balance user preferences with environmental affordances. We focus on organization tasks within constrained spaces, such as arranging items into a refrigerator, where preferences for placement collide with physical limitations. The robot must infer user preferences based on a small set of demonstrations, which is easier for users to provide tha… ▽ More Home robots performing personalized tasks must adeptly balance user preferences with environmental affordances. We focus on organization tasks within constrained spaces, such as arranging items into a refrigerator, where preferences for placement collide with physical limitations. The robot must infer user preferences based on a small set of demonstrations, which is easier for users to provide than extensively defining all their requirements. While recent works use Large Language Models (LLMs) to learn preferences from user demonstrations, they encounter two fundamental challenges. First, there is inherent ambiguity in interpreting user actions, as multiple preferences can often explain a single observed behavior. Second, not all user preferences are practically feasible due to geometric constraints in the environment. To address these challenges, we introduce APRICOT, a novel approach that merges LLM-based Bayesian active preference learning with constraint-aware task planning. APRICOT refines its generated preferences by actively querying the user and dynamically adapts its plan to respect environmental constraints. We evaluate APRICOT on a dataset of diverse organization tasks and demonstrate its effectiveness in real-world scenarios, showing significant improvements in both preference satisfaction and plan feasibility. The project website is at https://portal-cornell.github.io/apricot/ △ Less

Submitted 25 October, 2024; originally announced October 2024.

Comments: Conference on Robot Learning (CoRL) 2024

arXiv:2410.15472 [pdf]

doi 10.11159/icbes24.158

Multi-Layer Feature Fusion with Cross-Channel Attention-Based U-Net for Kidney Tumor Segmentation

Authors: Fnu Neha, Arvind K. Bansal

Abstract: Renal tumors, especially renal cell carcinoma (RCC), show significant heterogeneity, posing challenges for diagnosis using radiology images such as MRI, echocardiograms, and CT scans. U-Net based deep learning techniques are emerging as a promising approach for automated medical image segmentation for minimally invasive diagnosis of renal tumors. However, current techniques need further improvemen… ▽ More Renal tumors, especially renal cell carcinoma (RCC), show significant heterogeneity, posing challenges for diagnosis using radiology images such as MRI, echocardiograms, and CT scans. U-Net based deep learning techniques are emerging as a promising approach for automated medical image segmentation for minimally invasive diagnosis of renal tumors. However, current techniques need further improvements in accuracy to become clinically useful to radiologists. In this study, we present an improved U-Net based model for end-to-end automated semantic segmentation of CT scan images to identify renal tumors. The model uses residual connections across convolution layers, integrates a multi-layer feature fusion (MFF) and cross-channel attention (CCA) within encoder blocks, and incorporates skip connections augmented with additional information derived using MFF and CCA. We evaluated our model on the KiTS19 dataset, which contains data from 210 patients. For kidney segmentation, our model achieves a Dice Similarity Coefficient (DSC) of 0.97 and a Jaccard index (JI) of 0.95. For renal tumor segmentation, our model achieves a DSC of 0.96 and a JI of 0.91. Based on a comparison of available DSC scores, our model outperforms the current leading models. △ Less

Submitted 21 October, 2024; v1 submitted 20 October, 2024; originally announced October 2024.

Comments: 8 pages

Journal ref: Proceedings of the 10th World Congress on Electrical Engineering and Computer Systems and Science, Avestia Publishing, ISSN = 2369-811X, 2024

arXiv:2410.15229 [pdf]

doi 10.1080/19490976.2025.2505115

Deep Learning-based Detection of Bacterial Swarm Motion Using a Single Image

Authors: Yuzhu Li, Hao Li, Weijie Chen, Keelan O'Riordan, Neha Mani, Yuxuan Qi, Tairan Liu, Sridhar Mani, Aydogan Ozcan

Abstract: Distinguishing between swarming and swimming, the two principal forms of bacterial movement, holds significant conceptual and clinical relevance. This is because bacteria that exhibit swarming capabilities often possess unique properties crucial to the pathogenesis of infectious diseases and may also have therapeutic potential. Here, we report a deep learning-based swarming classifier that rapidly… ▽ More Distinguishing between swarming and swimming, the two principal forms of bacterial movement, holds significant conceptual and clinical relevance. This is because bacteria that exhibit swarming capabilities often possess unique properties crucial to the pathogenesis of infectious diseases and may also have therapeutic potential. Here, we report a deep learning-based swarming classifier that rapidly and autonomously predicts swarming probability using a single blurry image. Compared with traditional video-based, manually-processed approaches, our method is particularly suited for high-throughput environments and provides objective, quantitative assessments of swarming probability. The swarming classifier demonstrated in our work was trained on Enterobacter sp. SM3 and showed good performance when blindly tested on new swarming (positive) and swimming (negative) test images of SM3, achieving a sensitivity of 97.44% and a specificity of 100%. Furthermore, this classifier demonstrated robust external generalization capabilities when applied to unseen bacterial species, such as Serratia marcescens DB10 and Citrobacter koseri H6. It blindly achieved a sensitivity of 97.92% and a specificity of 96.77% for DB10, and a sensitivity of 100% and a specificity of 97.22% for H6. This competitive performance indicates the potential to adapt our approach for diagnostic applications through portable devices or even smartphones. This adaptation would facilitate rapid, objective, on-site screening for bacterial swarming motility, potentially enhancing the early detection and treatment assessment of various diseases, including inflammatory bowel diseases (IBD) and urinary tract infections (UTI). △ Less

Submitted 19 October, 2024; originally announced October 2024.

Comments: 17 Pages, 4 Figures

Journal ref: Gut Microbes (2025)

arXiv:2410.12311 [pdf, other]

Open Domain Question Answering with Conflicting Contexts

Authors: Siyi Liu, Qiang Ning, Kishaloy Halder, Wei Xiao, Zheng Qi, Phu Mon Htut, Yi Zhang, Neha Anna John, Bonan Min, Yassine Benajiba, Dan Roth

Abstract: Open domain question answering systems frequently rely on information retrieved from large collections of text (such as the Web) to answer questions. However, such collections of text often contain conflicting information, and indiscriminately depending on this information may result in untruthful and inaccurate answers. To understand the gravity of this problem, we collect a human-annotated datas… ▽ More Open domain question answering systems frequently rely on information retrieved from large collections of text (such as the Web) to answer questions. However, such collections of text often contain conflicting information, and indiscriminately depending on this information may result in untruthful and inaccurate answers. To understand the gravity of this problem, we collect a human-annotated dataset, Question Answering with Conflicting Contexts (QACC), and find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search. We evaluate and benchmark three powerful Large Language Models (LLMs) with our dataset QACC and demonstrate their limitations in effectively addressing questions with conflicting information. To explore how humans reason through conflicting contexts, we request our annotators to provide explanations for their selections of correct answers. We demonstrate that by finetuning LLMs to explain their answers, we can introduce richer information into their training that guide them through the process of reasoning with conflicting contexts. △ Less

Submitted 27 April, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.11967 [pdf]

Integrating Artificial Intelligence Models and Synthetic Image Data for Enhanced Asset Inspection and Defect Identification

Authors: Reddy Mandati, Vladyslav Anderson, Po-chen Chen, Ankush Agarwal, Tatjana Dokic, David Barnard, Michael Finn, Jesse Cromer, Andrew Mccauley, Clay Tutaj, Neha Dave, Bobby Besharati, Jamie Barnett, Timothy Krall

Abstract: In the past utilities relied on in-field inspections to identify asset defects. Recently, utilities have started using drone-based inspections to enhance the field-inspection process. We consider a vast repository of drone images, providing a wealth of information about asset health and potential issues. However, making the collected imagery data useful for automated defect detection requires sign… ▽ More In the past utilities relied on in-field inspections to identify asset defects. Recently, utilities have started using drone-based inspections to enhance the field-inspection process. We consider a vast repository of drone images, providing a wealth of information about asset health and potential issues. However, making the collected imagery data useful for automated defect detection requires significant manual labeling effort. We propose a novel solution that combines synthetic asset defect images with manually labeled drone images. This solution has several benefits: improves performance of defect detection, reduces the number of hours spent on manual labeling, and enables the capability to generate realistic images of rare defects where not enough real-world data is available. We employ a workflow that combines 3D modeling tools such as Maya and Unreal Engine to create photorealistic 3D models and 2D renderings of defective assets and their surroundings. These synthetic images are then integrated into our training pipeline augmenting the real data. This study implements an end-to-end Artificial Intelligence solution to detect assets and asset defects from the combined imagery repository. The unique contribution of this research lies in the application of advanced computer vision models and the generation of photorealistic 3D renderings of defective assets, aiming to transform the asset inspection process. Our asset detection model has achieved an accuracy of 92 percent, we achieved a performance lift of 67 percent when introducing approximately 2,000 synthetic images of 2k resolution. In our tests, the defect detection model achieved an accuracy of 73 percent across two batches of images. Our analysis demonstrated that synthetic data can be successfully used in place of real-world manually labeled data to train defect detection model. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.09047 [pdf, other]

Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models

Authors: Qin Liu, Chao Shang, Ling Liu, Nikolaos Pappas, Jie Ma, Neha Anna John, Srikanth Doss, Lluis Marquez, Miguel Ballesteros, Yassine Benajiba

Abstract: The safety alignment ability of Vision-Language Models (VLMs) is prone to be degraded by the integration of the vision module compared to its LLM backbone. We investigate this phenomenon, dubbed as ''safety alignment degradation'' in this paper, and show that the challenge arises from the representation gap that emerges when introducing vision modality to VLMs. In particular, we show that the repr… ▽ More The safety alignment ability of Vision-Language Models (VLMs) is prone to be degraded by the integration of the vision module compared to its LLM backbone. We investigate this phenomenon, dubbed as ''safety alignment degradation'' in this paper, and show that the challenge arises from the representation gap that emerges when introducing vision modality to VLMs. In particular, we show that the representations of multi-modal inputs shift away from that of text-only inputs which represent the distribution that the LLM backbone is optimized for. At the same time, the safety alignment capabilities, initially developed within the textual embedding space, do not successfully transfer to this new multi-modal representation space. To reduce safety alignment degradation, we introduce Cross-Modality Representation Manipulation (CMRM), an inference time representation intervention method for recovering the safety alignment ability that is inherent in the LLM backbone of VLMs, while simultaneously preserving the functional capabilities of VLMs. The empirical results show that our framework significantly recovers the alignment ability that is inherited from the LLM backbone with minimal impact on the fluency and linguistic capabilities of pre-trained VLMs even without additional training. Specifically, the unsafe rate of LLaVA-7B on multi-modal input can be reduced from 61.53% to as low as 3.15% with only inference-time intervention. WARNING: This paper contains examples of toxic or harmful language. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: Preprint

arXiv:2410.00260 [pdf, other]

DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining

Authors: Vinayak Arannil, Neha Narwal, Sourav Sanjukta Bhabesh, Sai Nikhil Thirandas, Darren Yow-Bang Wang, Graham Horwood, Alex Anto Chirayath, Gouri Pandeshwar

Abstract: Large Language Models (LLMs) have shown remarkable ability to generalize effectively across numerous industry domains while executing a range of tasks. Many of these competencies are obtained from the data utilized during the pre-training phase of the Language Models (LMs). However, these models exhibit limitations when tasked with performing in specialized or low-resource industry domains. More r… ▽ More Large Language Models (LLMs) have shown remarkable ability to generalize effectively across numerous industry domains while executing a range of tasks. Many of these competencies are obtained from the data utilized during the pre-training phase of the Language Models (LMs). However, these models exhibit limitations when tasked with performing in specialized or low-resource industry domains. More recent approaches use LLMs for generating domain-specific synthetic data but most often they lack in truthfulness and complexity. Alternatively, in cases where domain data is available like healthcare and finance most of the LMs are proprietary necessitating the need for a scalable method to curate real world industry specific pre-training data. In this work, we propose an automated and scalable framework - DoPAMine:Domain-specific Pre-training Adaptation from seed-guided data Mining, to mine domain specific training data from a large data corpus for domain adaptation of a LM. The framework leverages the parametric knowledge of a LLM to generate diverse and representative seed data tailored to a specific domain which is then used to mine real world data from a large data corpus like Common Crawl. We evaluated our framework's performance in the continual pre-training (CPT) setting by training two domain specific 7B parameter LMs in healthcare and finance with data mined via DoPAMine. Our experiments show that DoPAMine boosts the performance of pre-trained LLMs on average by 4.9% and 5.1% in zero-shot and 5-shot settings respectively on healthcare tasks from MMLU, MedQA, MedMCQA and PubMedQA datasets, and 2.9% and 6.7% for zero-shot and 5-shot settings respectively on finance tasks from FiQA-SA, FPB and Headlines datasets when compared to the baseline. △ Less

Submitted 9 October, 2024; v1 submitted 30 September, 2024; originally announced October 2024.

arXiv:2409.16560 [pdf, other]

Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference

Authors: Zongyue Qin, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun

Abstract: Large language models (LLMs) have shown outstanding performance across numerous real-world tasks. However, the autoregressive nature of these models makes the inference process slow and costly. Speculative decoding has emerged as a promising solution, leveraging a smaller auxiliary model to draft future tokens, which are then validated simultaneously by the larger model, achieving a speed-up of 1-… ▽ More Large language models (LLMs) have shown outstanding performance across numerous real-world tasks. However, the autoregressive nature of these models makes the inference process slow and costly. Speculative decoding has emerged as a promising solution, leveraging a smaller auxiliary model to draft future tokens, which are then validated simultaneously by the larger model, achieving a speed-up of 1-2x. Although speculative decoding matches the same distribution as multinomial sampling, multinomial sampling itself is prone to suboptimal outputs, whereas beam sampling is widely recognized for producing higher-quality results by maintaining multiple candidate sequences at each step. This paper explores the novel integration of speculative decoding with beam sampling. However, there are four key challenges: (1) how to generate multiple sequences from the larger model's distribution given drafts sequences from the small model; (2) how to dynamically optimize the number of beams to balance efficiency and accuracy; (3) how to efficiently verify the multiple drafts in parallel; and (4) how to address the extra memory costs inherent in beam sampling. To address these challenges, we propose dynamic-width speculative beam decoding (DSBD). Specifically, we first introduce a novel draft and verification scheme that generates multiple sequences following the large model's distribution based on beam sampling trajectories from the small model. Then, we introduce an adaptive mechanism to dynamically tune the number of beams based on the context, optimizing efficiency and effectiveness. Besides, we extend tree-based parallel verification to handle multiple trees simultaneously, accelerating the verification process. Finally, we illustrate a simple modification to our algorithm to mitigate the memory overhead of beam sampling... △ Less

Submitted 14 March, 2025; v1 submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.06131 [pdf, other]

Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review

Authors: Neha Prakriya, Jui-Nan Yen, Cho-Jui Hsieh, Jason Cong

Abstract: Traditional Large Language Model (LLM) pretraining relies on autoregressive language modeling with randomly sampled data from web-scale datasets. Inspired by human learning techniques like spaced repetition, we hypothesize that random sampling leads to high training costs, lower-quality models, and significant data forgetting. To address these inefficiencies, we propose the Learn-Focus-Review (LFR… ▽ More Traditional Large Language Model (LLM) pretraining relies on autoregressive language modeling with randomly sampled data from web-scale datasets. Inspired by human learning techniques like spaced repetition, we hypothesize that random sampling leads to high training costs, lower-quality models, and significant data forgetting. To address these inefficiencies, we propose the Learn-Focus-Review (LFR) paradigm -- a dynamic training approach that adapts to the model's learning progress. LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset that are more prone to being forgotten, enabling better retention and more efficient learning. Using the LFR paradigm, we pretrained Llama and GPT models on the SlimPajama and OpenWebText datasets, respectively. These models were evaluated on downstream tasks across various domains, including question answering, problem-solving, commonsense reasoning, language modeling, and translation. Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy, while using only 5%--19% of the training tokens. Furthermore, LFR matched the performance of industry-standard Pythia models with up to 2$\times$ the parameter count, using just 3.2% of the training tokens, demonstrating its effectiveness and efficiency. △ Less

Submitted 28 January, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.04880 [pdf, other]

Towards identifying Source credibility on Information Leakage in Digital Gadget Market

Authors: Neha Kumaru, Garvit Gupta, Shreyas Mongia, Shubham Singh, Ponnurangam Kumaraguru, Arun Balaji Buduru

Abstract: The use of Social media to share content is on a constant rise. One of the capsize effect of information sharing on Social media includes the spread of sensitive information on the public domain. With the digital gadget market becoming highly competitive and ever-evolving, the trend of an increasing number of sensitive posts leaking information on devices in social media is observed. Many web-blog… ▽ More The use of Social media to share content is on a constant rise. One of the capsize effect of information sharing on Social media includes the spread of sensitive information on the public domain. With the digital gadget market becoming highly competitive and ever-evolving, the trend of an increasing number of sensitive posts leaking information on devices in social media is observed. Many web-blogs on digital gadget market have mushroomed recently, making the problem of information leak all pervasive. Credible leaks on specifics of an upcoming device can cause a lot of financial damage to the respective organization. Hence, it is crucial to assess the credibility of the platforms that continuously post about a smartphone or digital gadget leaks. In this work, we analyze the headlines of leak web-blog posts and their corresponding official press-release. We first collect 54, 495 leak and press-release headlines for different smartphones. We train our custom NER model to capture the evolving smartphone names with an accuracy of 82.14% on manually annotated results. We further propose a credibility score metric for the web-blog, based on the number of falsified and authentic smartphone leak posts. △ Less

Submitted 7 September, 2024; originally announced September 2024.

arXiv:2408.10239 [pdf, ps, other]

A Conceptual Framework for Ethical Evaluation of Machine Learning Systems

Authors: Neha R. Gupta, Jessica Hullman, Hari Subramonyam

Abstract: Research in Responsible AI has developed a range of principles and practices to ensure that machine learning systems are used in a manner that is ethical and aligned with human values. However, a critical yet often neglected aspect of ethical ML is the ethical implications that appear when designing evaluations of ML systems. For instance, teams may have to balance a trade-off between highly infor… ▽ More Research in Responsible AI has developed a range of principles and practices to ensure that machine learning systems are used in a manner that is ethical and aligned with human values. However, a critical yet often neglected aspect of ethical ML is the ethical implications that appear when designing evaluations of ML systems. For instance, teams may have to balance a trade-off between highly informative tests to ensure downstream product safety, with potential fairness harms inherent to the implemented testing procedures. We conceptualize ethics-related concerns in standard ML evaluation techniques. Specifically, we present a utility framework, characterizing the key trade-off in ethical evaluation as balancing information gain against potential ethical harms. The framework is then a tool for characterizing challenges teams face, and systematically disentangling competing considerations that teams seek to balance. Differentiating between different types of issues encountered in evaluation allows us to highlight best practices from analogous domains, such as clinical trials and automotive crash testing, which navigate these issues in ways that can offer inspiration to improve evaluation processes in ML. Our analysis underscores the critical need for development teams to deliberately assess and manage ethical complexities that arise during the evaluation of ML systems, and for the industry to move towards designing institutional policies to support ethical evaluations. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Showing 1–50 of 275 results for author: Neha