Skip to main content

Showing 1–50 of 179 results for author: Jain, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00951  [pdf, ps, other

    cs.AI

    Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

    Authors: Rizwan Qureshi, Ranjan Sapkota, Abbas Shah, Amgad Muneer, Anas Zafar, Ashmal Vayani, Maged Shoman, Abdelrahman B. M. Eldaly, Kai Zhang, Ferhat Sadak, Shaina Raza, Xinqi Fan, Ravid Shwartz-Ziv, Hong Yan, Vinjia Jain, Aman Chadha, Manoj Karkee, Jia Wu, Philip Torr, Seyedali Mirjalili

    Abstract: Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  2. arXiv:2506.23025  [pdf, ps, other

    cs.LG cs.AI

    Spectra 1.1: Scaling Laws and Efficient Inference for Ternary Language Models

    Authors: Tejas Vaidhya, Ayush Kaushal, Vineet Jain, Francis Couture Harpin, Prashant Shishodia, Majid Behbahani, Yuriy Nevmyvaka, Irina Rish

    Abstract: Large language models (LLMs) are increasingly used across research and industry applications, yet their inference efficiency remains a significant challenge. As the computational power of modern GPU architectures continuously improves, their memory bandwidth and capacity have not scaled proportionally, creating a critical bottleneck during inference. To address this, we investigate ternary languag… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  3. arXiv:2506.22960  [pdf, ps, other

    cs.CV

    Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images

    Authors: Shreyas Dixit, Ashhar Aziz, Shashwat Bajpai, Vasu Sharma, Aman Chadha, Vinija Jain, Amitava Das

    Abstract: A report by the European Union Law Enforcement Agency predicts that by 2026, up to 90 percent of online content could be synthetically generated, raising concerns among policymakers, who cautioned that "Generative AI could act as a force multiplier for political disinformation. The combined effect of generative text, images, videos, and audio may surpass the influence of any single modality." In r… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  4. arXiv:2506.22396  [pdf, ps, other

    cs.CL cs.AI

    QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization

    Authors: Danush Khanna, Aditya Kumar Guru, Srivarshinee Sridhar, Zidan Ahmed, Rubhav Bahirwani, Meetu Malhotra, Vinija Jain, Aman Chadha, Amitava Das, Kripabandhu Ghosh

    Abstract: Inference accounts for the majority of latency and energy consumption in large language model (LLM) deployments, often exceeding 90% of total cost. While training-time efficiency has seen extensive progress, runtime optimization remains a key bottleneck, particularly under autoregressive decoding. Existing approaches -- such as pruning, quantization, early exits, and speculative decoding -- often… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Preprint. Under submission

    ACM Class: I.2.0; I.2.7

  5. arXiv:2506.20701  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models

    Authors: Vineet Jain, Kusha Sareen, Mohammad Pedramfar, Siamak Ravanbakhsh

    Abstract: Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, resulting in inefficient use of compute. Inspired by the success of Monte Car… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  6. arXiv:2506.13901  [pdf, ps, other

    cs.CL cs.AI

    Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations

    Authors: Abhilekh Borah, Chhavi Sharma, Danush Khanna, Utkarsh Bhatt, Gurpreet Singh, Hasnat Md Abdullah, Raghav Kaushik Ravi, Vinija Jain, Jyoti Patel, Shubham Singh, Vasu Sharma, Arpita Vats, Rahul Raja, Aman Chadha, Amitava Das

    Abstract: Alignment is no longer a luxury, it is a necessity. As large language models (LLMs) enter high-stakes domains like education, healthcare, governance, and law, their behavior must reliably reflect human-aligned values and safety constraints. Yet current evaluations rely heavily on behavioral proxies such as refusal rates, G-Eval scores, and toxicity classifiers, all of which have critical blind spo… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  7. arXiv:2506.08885  [pdf, ps, other

    cs.CL cs.LG

    AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)

    Authors: Danush Khanna, Krishna Kumar, Basab Ghosh, Vinija Jain, Vasu Sharma, Aman Chadha, Amitava Das

    Abstract: Adversarial threats against LLMs are escalating faster than current defenses can adapt. We expose a critical geometric blind spot in alignment: adversarial prompts exploit latent camouflage, embedding perilously close to the safe representation manifold while encoding unsafe intent thereby evading surface level defenses like Direct Preference Optimization (DPO), which remain blind to the latent ge… ▽ More

    Submitted 11 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  8. arXiv:2505.18988  [pdf, ps, other

    cs.CV

    NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results

    Authors: Varun Jain, Zongwei Wu, Quan Zou, Louis Florentin, Henrik Turbell, Sandeep Siddhartha, Radu Timofte, others

    Abstract: This paper presents a comprehensive review of the 1st Challenge on Video Quality Enhancement for Video Conferencing held at the NTIRE workshop at CVPR 2025, and highlights the problem statement, datasets, proposed solutions, and results. The aim of this challenge was to design a Video Quality Enhancement (VQE) model to enhance video quality in video conferencing scenarios by (a) improving lighting… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  9. arXiv:2505.14900  [pdf

    cs.DB

    Implementing Decentralized Per-Partition Automatic Failover in Azure Cosmos DB

    Authors: Josh Rowe, Mikael Horal, Hari Sudan Sundar, Muthukumaran Arumugam, Burak Kose, Sravani Mitra Palivela, Geni Marsh, Varun Jain, Abhishek Kumar, Dhaval Patel

    Abstract: Azure Cosmos DB is a cloud-native distributed database, operating at a massive scale, powering Microsoft Cloud. Think 10s of millions of database partitions (replica-sets), 100+ PBs of data under management, 20M+ vCores. Failovers are an integral part of distributed databases to provide data availability during outages (partial or full regional outages). While failovers within a replica-set within… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    ACM Class: H.2.4; H.2.7

  10. arXiv:2505.04982  [pdf, other

    cs.RO eess.SY

    A Vehicle System for Navigating Among Vulnerable Road Users Including Remote Operation

    Authors: Oscar de Groot, Alberto Bertipaglia, Hidde Boekema, Vishrut Jain, Marcell Kegl, Varun Kotian, Ted Lentsch, Yancong Lin, Chrysovalanto Messiou, Emma Schippers, Farzam Tajdari, Shiming Wang, Zimin Xia, Mubariz Zaffar, Ronald Ensing, Mario Garzon, Javier Alonso-Mora, Holger Caesar, Laura Ferranti, Riender Happee, Julian F. P. Kooij, Georgios Papaioannou, Barys Shyrokau, Dariu M. Gavrila

    Abstract: We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion planning, and control, integrated into a prototype vehicle. A key innovation is a motion planner based on Topology-driven Model Predictive Control (T-MPC). The gu… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Intelligent Vehicles Symposium 2025

  11. arXiv:2504.12401  [pdf, other

    cs.CV

    NTIRE 2025 Challenge on Event-Based Image Deblurring: Methods and Results

    Authors: Lei Sun, Andrea Alfarano, Peiqi Duan, Shaolin Su, Kaiwei Wang, Boxin Shi, Radu Timofte, Danda Pani Paudel, Luc Van Gool, Qinglin Liu, Wei Yu, Xiaoqian Lv, Lu Yang, Shuigen Wang, Shengping Zhang, Xiangyang Ji, Long Bao, Yuqiang Yang, Jinao Song, Ziyi Wang, Shuang Wen, Heng Sun, Kean Liu, Mingchen Zhong, Senyan Xu , et al. (63 additional authors not shown)

    Abstract: This paper presents an overview of NTIRE 2025 the First Challenge on Event-Based Image Deblurring, detailing the proposed methodologies and corresponding results. The primary goal of the challenge is to design an event-based method that achieves high-quality image deblurring, with performance quantitatively assessed using Peak Signal-to-Noise Ratio (PSNR). Notably, there are no restrictions on com… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  12. arXiv:2503.13517  [pdf, other

    cs.CL cs.AI

    CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning

    Authors: Hao Cui, Zahra Shamsi, Gowoon Cheon, Xuejian Ma, Shutong Li, Maria Tikhanovskaya, Peter Norgaard, Nayantara Mudur, Martyna Plomecka, Paul Raccuglia, Yasaman Bahri, Victor V. Albert, Pranesh Srinivasan, Haining Pan, Philippe Faist, Brian Rohr, Ekin Dogus Cubuk, Muratahan Aykol, Amil Merchant, Michael J. Statt, Dan Morris, Drew Purves, Elise Kleeman, Ruth Alcantara, Matthew Abraham , et al. (9 additional authors not shown)

    Abstract: Scientific problem-solving involves synthesizing information while applying expert knowledge. We introduce CURIE, a scientific long-Context Understanding,Reasoning and Information Extraction benchmark to measure the potential of Large Language Models (LLMs) in scientific problem-solving and assisting scientists in realistic workflows. This benchmark introduces ten challenging tasks with a total of… ▽ More

    Submitted 13 May, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: Accepted at ICLR 2025 main conference

  13. arXiv:2503.02618  [pdf, other

    q-bio.NC cs.CV cs.LG

    ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish

    Authors: Jan-Matthis Lueckmann, Alexander Immer, Alex Bo-Yuan Chen, Peter H. Li, Mariela D. Petkova, Nirmala A. Iyer, Luuk Willem Hesselink, Aparna Dev, Gudrun Ihrke, Woohyun Park, Alyson Petruncio, Aubrey Weigel, Wyatt Korff, Florian Engert, Jeff W. Lichtman, Misha B. Ahrens, Michał Januszewski, Viren Jain

    Abstract: Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we introduce the Zebrafish Activity Prediction Benchmark (ZAPBench) to measure progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet m… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  14. arXiv:2503.02091  [pdf, other

    cs.SE

    Which Code Statements Implement Privacy Behaviors in Android Applications?

    Authors: Chia-Yi Su, Aakash Bansal, Vijayanta Jain, Sepideh Ghanavati, Sai Teja Peddinti, Collin McMillan

    Abstract: A "privacy behavior" in software is an action where the software uses personal information for a service or a feature, such as a website using location to provide content relevant to a user. Programmers are required by regulations or application stores to provide privacy notices and labels describing these privacy behaviors. Although many tools and research prototypes have been developed to help p… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 16 pages, 8 figures, under review

  15. arXiv:2503.00073  [pdf, other

    cs.CV cs.LG q-bio.NC

    Forecasting Whole-Brain Neuronal Activity from Volumetric Video

    Authors: Alexander Immer, Jan-Matthis Lueckmann, Alex Bo-Yuan Chen, Peter H. Li, Mariela D. Petkova, Nirmala A. Iyer, Aparna Dev, Gudrun Ihrke, Woohyun Park, Alyson Petruncio, Aubrey Weigel, Wyatt Korff, Florian Engert, Jeff W. Lichtman, Misha B. Ahrens, Viren Jain, Michał Januszewski

    Abstract: Large-scale neuronal activity recordings with fluorescent calcium indicators are increasingly common, yielding high-resolution 2D or 3D videos. Traditional analysis pipelines reduce this data to 1D traces by segmenting regions of interest, leading to inevitable information loss. Inspired by the success of deep learning on minimally processed data in other domains, we investigate the potential of f… ▽ More

    Submitted 27 February, 2025; originally announced March 2025.

  16. arXiv:2502.03512  [pdf, other

    cs.AI

    YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment

    Authors: Amitava Das, Yaswanth Narsupalli, Gurpreet Singh, Vinija Jain, Vasu Sharma, Suranjana Trivedy, Aman Chadha, Amit Sheth

    Abstract: Precise alignment in Text-to-Image (T2I) systems is crucial to ensure that generated visuals not only accurately encapsulate user intents but also conform to stringent ethical and aesthetic benchmarks. Incidents like the Google Gemini fiasco, where misaligned outputs triggered significant public backlash, underscore the critical need for robust alignment mechanisms. In contrast, Large Language Mod… ▽ More

    Submitted 9 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  17. arXiv:2502.01673  [pdf, other

    cs.CL cs.AI

    Multilingual State Space Models for Structured Question Answering in Indic Languages

    Authors: Arpita Vats, Rahul Raja, Mrinal Mathur, Vinija Jain, Aman Chadha

    Abstract: The diversity and complexity of Indic languages present unique challenges for natural language processing (NLP) tasks, particularly in the domain of question answering (QA).To address these challenges, this paper explores the application of State Space Models (SSMs),to build efficient and contextually aware QA systems tailored for Indic languages. SSMs are particularly suited for this task due to… ▽ More

    Submitted 24 April, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

    Comments: Accepted at NAACL

  18. arXiv:2501.15747  [pdf, other

    cs.CL cs.AI

    IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding

    Authors: Sankalp KJ, Ashutosh Kumar, Laxmaan Balaji, Nikunj Kotecha, Vinija Jain, Aman Chadha, Sreyoshi Bhaduri

    Abstract: Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro… ▽ More

    Submitted 27 January, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

  19. arXiv:2501.03271  [pdf, other

    cs.LG cs.AI cs.CL

    DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

    Authors: Amitava Das, Suranjana Trivedy, Danush Khanna, Rajarshi Roy, Gurpreet Singh, Basab Ghosh, Yaswanth Narsupalli, Vinija Jain, Vasu Sharma, Aishwarya Naresh Reganti, Aman Chadha

    Abstract: The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations. We propose DPO-Kernels, which integrates kernel methods to address these issues through four key c… ▽ More

    Submitted 19 January, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

    MSC Class: 68T45

  20. arXiv:2412.17304  [pdf, other

    cs.AI

    On the Feasibility of Vision-Language Models for Time-Series Classification

    Authors: Vinay Prithyani, Mohsin Mohammed, Richa Gadgil, Ricardo Buitrago, Vinija Jain, Aman Chadha

    Abstract: We build upon time-series classification by leveraging the capabilities of Vision Language Models (VLMs). We find that VLMs produce competitive results after two or less epochs of fine-tuning. We develop a novel approach that incorporates graphical data representations as images in conjunction with numerical data. This approach is rooted in the hypothesis that graphical representations can provide… ▽ More

    Submitted 17 January, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

  21. arXiv:2412.17131  [pdf, other

    cs.CL

    LLMsAgainstHate @ NLU of Devanagari Script Languages 2025: Hate Speech Detection and Target Identification in Devanagari Languages via Parameter Efficient Fine-Tuning of LLMs

    Authors: Rushendra Sidibomma, Pransh Patwa, Parth Patwa, Aman Chadha, Vinija Jain, Amitava Das

    Abstract: The detection of hate speech has become increasingly important in combating online hostility and its real-world consequences. Despite recent advancements, there is limited research addressing hate speech detection in Devanagari-scripted languages, where resources and tools are scarce. While large language models (LLMs) have shown promise in language-related tasks, traditional fine-tuning approache… ▽ More

    Submitted 26 December, 2024; v1 submitted 22 December, 2024; originally announced December 2024.

  22. arXiv:2412.15443  [pdf

    cs.CL

    SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval

    Authors: Aakash Mahalingam, Vinesh Kumar Gande, Aman Chadha, Vinija Jain, Divya Chaudhary

    Abstract: Retrieval-Augmented Generation (RAG) systems have become pivotal in leveraging vast corpora to generate informed and contextually relevant responses, notably reducing hallucinations in Large Language Models. Despite significant advancements, these systems struggle to efficiently process and retrieve information from large datasets while maintaining a comprehensive understanding of the context. Thi… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: 16 pages, 8 figures, Workshop on Generative AI and Knowledge Graphs (GenAIK) at The 31st International Conference on Computational Linguistics (COLING 2025)

    Journal ref: Workshop on Generative AI and Knowledge Graphs (GenAIK) at The 31st International Conference on Computational Linguistics (COLING 2025)

  23. arXiv:2412.13935  [pdf, other

    cs.LG cs.AI

    Spatio-Temporal Forecasting of PM2.5 via Spatial-Diffusion guided Encoder-Decoder Architecture

    Authors: Malay Pandey, Vaishali Jain, Nimit Godhani, Sachchida Nand Tripathi, Piyush Rai

    Abstract: In many problem settings that require spatio-temporal forecasting, the values in the time-series not only exhibit spatio-temporal correlations but are also influenced by spatial diffusion across locations. One such example is forecasting the concentration of fine particulate matter (PM2.5) in the atmosphere which is influenced by many complex factors, the most important ones being diffusion due to… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: 9 pages, 4 figures, International Conference on Data Science and Management of Data (CODS-COMAD), IIT Jodhpur, 2024

  24. arXiv:2412.00869  [pdf, other

    cs.CL cs.AI

    KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting

    Authors: Thilini Wijesiriwardene, Ruwan Wickramarachchi, Sreeram Vennam, Vinija Jain, Aman Chadha, Amitava Das, Ponnurangam Kumaraguru, Amit Sheth

    Abstract: Making analogies is fundamental to cognition. Proportional analogies, which consist of four terms, are often used to assess linguistic and cognitive abilities. For instance, completing analogies like "Oxygen is to Gas as <blank> is to <blank>" requires identifying the semantic relationship (e.g., "type of") between the first pair of terms ("Oxygen" and "Gas") and finding a second pair that shares… ▽ More

    Submitted 18 December, 2024; v1 submitted 1 December, 2024; originally announced December 2024.

    Comments: Accepted at COLING 2025

  25. arXiv:2411.16754  [pdf, other

    cs.CV cs.AI

    Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)

    Authors: Nasrin Imanpour, Shashwat Bajpai, Subhankar Ghosh, Sainath Reddy Sankepally, Abhilekh Borah, Hasnat Md Abdullah, Nishoak Kosaraju, Shreyas Dixit, Ashhar Aziz, Shwetangshu Biswas, Vinija Jain, Aman Chadha, Amit Sheth, Amitava Das

    Abstract: The proliferation of AI techniques for image generation, coupled with their increasing accessibility, has raised significant concerns about the potential misuse of these images to spread misinformation. Recent AI-generated image detection (AGID) methods include CNNDetection, NPR, DM Image Detection, Fake Image Detection, DIRE, LASTED, GAN Image Detection, AIDE, SSP, DRCT, RINE, OCC-CLIP, De-Fake,… ▽ More

    Submitted 24 November, 2024; originally announced November 2024.

    Comments: 13 pages, 9 figures

  26. arXiv:2411.10867  [pdf, other

    cs.CV cs.AI

    ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models

    Authors: Vipula Rawte, Sarthak Jain, Aarush Sinha, Garv Kaushik, Aman Bansal, Prathiksha Rumale Vishwanath, Samyak Rajesh Jain, Aishwarya Naresh Reganti, Vinija Jain, Aman Chadha, Amit P. Sheth, Amitava Das

    Abstract: Recent advances in Large Multimodal Models (LMMs) have expanded their capabilities to video understanding, with Text-to-Video (T2V) models excelling in generating videos from textual prompts. However, they still frequently produce hallucinated content, revealing AI-generated inconsistencies. We introduce ViBe (https://vibe-t2v-bench.github.io/): a large-scale dataset of hallucinated videos from op… ▽ More

    Submitted 19 March, 2025; v1 submitted 16 November, 2024; originally announced November 2024.

  27. arXiv:2410.19419  [pdf, other

    cs.CL

    KAHANI: Culturally-Nuanced Visual Storytelling Tool for Non-Western Cultures

    Authors: Hamna, Deepthi Sudharsan, Agrima Seth, Ritvik Budhiraja, Deepika Khullar, Vyshak Jain, Kalika Bali, Aditya Vashistha, Sameer Segal

    Abstract: Large Language Models (LLMs) and Text-To-Image (T2I) models have demonstrated the ability to generate compelling text and visual stories. However, their outputs are predominantly aligned with the sensibilities of the Global North, often resulting in an outsider's gaze on other cultures. As a result, non-Western communities have to put extra effort into generating culturally specific stories. To ad… ▽ More

    Submitted 11 March, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

    Comments: Under review

  28. arXiv:2410.18932  [pdf, other

    cs.RO cs.AI cs.CV

    ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation

    Authors: Vidhi Jain, Rishi Veerapaneni, Yonatan Bisk

    Abstract: We propose Audio Noise Awareness using Visuals of Indoors for NAVIgation for quieter robot path planning. While humans are naturally aware of the noise they make and its impact on those around them, robots currently lack this awareness. A key challenge in achieving audio awareness for robots is estimating how loud will the robot's actions be at a listener's location? Since sound depends upon the g… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: 8th Conference on Robot Learning (CoRL) 2024

  29. arXiv:2410.04236  [pdf, other

    cs.CL cs.AI cs.LG

    Overview of Factify5WQA: Fact Verification through 5W Question-Answering

    Authors: Suryavardan Suresh, Anku Rani, Parth Patwa, Aishwarya Reganti, Vinija Jain, Aman Chadha, Amitava Das, Amit Sheth, Asif Ekbal

    Abstract: Researchers have found that fake news spreads much times faster than real news. This is a major problem, especially in today's world where social media is the key source of news for many among the younger population. Fact verification, thus, becomes an important task and many media sites contribute to the cause. Manual fact verification is a tedious task, given the volume of fake news online. The… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

    Comments: Accepted at defactify3@aaai2024

  30. arXiv:2410.01312  [pdf, other

    cs.LG

    Sampling from Energy-based Policies using Diffusion

    Authors: Vineet Jain, Tara Akhound-Sadegh, Siamak Ravanbakhsh

    Abstract: Energy-based policies offer a flexible framework for modeling complex, multimodal behaviors in reinforcement learning (RL). In maximum entropy RL, the optimal policy is a Boltzmann distribution derived from the soft Q-function, but direct sampling from this distribution in continuous action spaces is computationally intractable. As a result, existing methods typically use simpler parametric distri… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  31. arXiv:2409.11654  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.NC

    How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

    Authors: Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B. Burkhardt, Andrea Califano, Jonah Cool, Abby F. Dernburg, Kirsty Ewing, Emily B. Fox, Matthias Haury, Amy E. Herr, Eric Horvitz, Patrick D. Hsu, Viren Jain, Gregory R. Johnson, Thomas Kalil, David R. Kelley, Shana O. Kelley, Anna Kreshuk , et al. (17 additional authors not shown)

    Abstract: The cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision… ▽ More

    Submitted 14 October, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

  32. arXiv:2409.09269  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types

    Authors: Neelabh Sinha, Vinija Jain, Aman Chadha

    Abstract: Visual Question-Answering (VQA) has become key to user experience, particularly after improved generalization capabilities of Vision-Language Models (VLMs). But evaluating VLMs for an application requirement using a standardized framework in practical settings is still challenging. This paper aims to solve that using an end-to-end framework. We present VQA360 - a novel dataset derived from establi… ▽ More

    Submitted 12 December, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted at The First Workshop of Evaluation of Multi-Modal Generation (EvalMG) in 31st International Conference on Computational Linguistics (COLING), 2025. 8 pages + references + 6 pages of Appendix

  33. arXiv:2408.10446  [pdf, other

    cs.CV cs.AI

    The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks

    Authors: Niyar R Barman, Krish Sharma, Ashhar Aziz, Shashwat Bajpai, Shwetangshu Biswas, Vasu Sharma, Vinija Jain, Aman Chadha, Amit Sheth, Amitava Das

    Abstract: The rapid advancement of text-to-image generation systems, exemplified by models like Stable Diffusion, Midjourney, Imagen, and DALL-E, has heightened concerns about their potential misuse. In response, companies like Meta and Google have intensified their efforts to implement watermarking techniques on AI-generated images to curb the circulation of potentially misleading visuals. However, in this… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 23 pages and 10 figures

  34. arXiv:2408.03466  [pdf, ps, other

    cs.DS math.CO

    Rapid mixing of the down-up walk on matchings of a fixed size

    Authors: Vishesh Jain, Clayton Mizgerd

    Abstract: Let $G = (V,E)$ be a graph on $n$ vertices and let $m^*(G)$ denote the size of a maximum matching in $G$. We show that for any $δ> 0$ and for any $1 \leq k \leq (1-δ)m^*(G)$, the down-up walk on matchings of size $k$ in $G$ mixes in time polynomial in $n$. Previously, polynomial mixing was not known even for graphs with maximum degree $Δ$, and our result makes progress on a conjecture of Jain, Per… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 12 pages; comments welcome

  35. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (173 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 October, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  36. arXiv:2407.21602  [pdf, other

    cs.LG math.DS physics.comp-ph physics.flu-dyn

    Higher order quantum reservoir computing for non-intrusive reduced-order models

    Authors: Vinamr Jain, Romit Maulik

    Abstract: Forecasting dynamical systems is of importance to numerous real-world applications. When possible, dynamical systems forecasts are constructed based on first-principles-based models such as through the use of differential equations. When these equations are unknown, non-intrusive techniques must be utilized to build predictive models from data alone. Machine learning (ML) methods have recently bee… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  37. arXiv:2407.15266  [pdf, other

    cs.NI

    STrack: A Reliable Multipath Transport for AI/ML Clusters

    Authors: Yanfang Le, Rong Pan, Peter Newman, Jeremias Blendin, Abdul Kabbani, Vipin Jain, Raghava Sivaramu, Francis Matus

    Abstract: Emerging artificial intelligence (AI) and machine learning (ML) workloads present new challenges of managing the collective communication used in distributed training across hundreds or even thousands of GPUs. This paper presents STrack, a novel hardware-offloaded reliable transport protocol aimed at improving the performance of AI /ML workloads by rethinking key aspects of the transport layer. ST… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  38. arXiv:2407.06939  [pdf, other

    cs.RO cs.CV

    Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

    Authors: Sriram Yenamandra, Arun Ramachandran, Mukul Khanna, Karmesh Yadav, Jay Vakil, Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke, Yang Luo, Jinxin Zhu, Yansen Han, Bingyi Lu, Xuan Gu, Qinyuan Liu, Yaping Zhao, Qiting Ye, Chenxiao Dou, Yansong Chua, Volodymyr Kuzma , et al. (20 additional authors not shown)

    Abstract: In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface withi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  39. arXiv:2406.13564  [pdf, other

    cs.CV cs.AI

    Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor

    Authors: Veedant Jain, Felipe dos Santos Alves Feitosa, Gabriel Kreiman

    Abstract: Despite significant advancements in computer vision, understanding complex scenes, particularly those involving humor, remains a substantial challenge. This paper introduces HumorDB, a novel image-only dataset specifically designed to advance visual humor understanding. HumorDB consists of meticulously curated image pairs with contrasting humor ratings, emphasizing subtle visual cues that trigger… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 7 main figures, 5 additional appendix figures

    ACM Class: I.5.4

  40. arXiv:2406.12644  [pdf, other

    cs.CL cs.AI

    Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles

    Authors: Devichand Budagam, Ashutosh Kumar, Mahsa Khoshnoodi, Sankalp KJ, Vinija Jain, Aman Chadha

    Abstract: Assessing the effectiveness of large language models (LLMs) in performing different tasks is crucial for understanding their strengths and weaknesses. This paper presents Hierarchical Prompting Taxonomy (HPT), grounded on human cognitive principles and designed to assess LLMs by examining the cognitive demands of various tasks. The HPT utilizes the Hierarchical Prompting Framework (HPF), which str… ▽ More

    Submitted 11 December, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  41. arXiv:2406.11402  [pdf, other

    cs.CL cs.AI cs.LG

    Are Small Language Models Ready to Compete with Large Language Models for Practical Applications?

    Authors: Neelabh Sinha, Vinija Jain, Aman Chadha

    Abstract: The rapid rise of Language Models (LMs) has expanded their use in several applications. Yet, due to constraints of model size, associated cost, or proprietary restrictions, utilizing state-of-the-art (SOTA) LLMs is not always feasible. With open, smaller LMs emerging, more applications can leverage their capabilities, but selecting the right LM can be challenging as smaller LMs do not perform well… ▽ More

    Submitted 12 March, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at The Fifth Workshop on Trustworthy Natural Language Processing (TrustNLP 2025) in Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025. 8 pages + references + Appendix

  42. arXiv:2406.11109  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating Annotator Bias in Large Language Models for Hate Speech Detection

    Authors: Amit Das, Zheng Zhang, Najib Hasan, Souvika Sarkar, Fatemeh Jamshidi, Tathagata Bhattacharya, Mostafa Rahgouy, Nilanjana Raychawdhary, Dongji Feng, Vinija Jain, Aman Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals

    Abstract: Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated Large Language Models (LLMs) presents a unique opportunity to modernize and streamline this complex procedure. While existing researc… ▽ More

    Submitted 16 November, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at NeurIPS Safe Generative AI Workshop, 2024

  43. arXiv:2406.09559  [pdf, other

    cs.CL cs.AI cs.LG

    Decoding the Diversity: A Review of the Indic AI Research Landscape

    Authors: Sankalp KJ, Vinija Jain, Sreyoshi Bhaduri, Tamoghna Roy, Aman Chadha

    Abstract: This review paper provides a comprehensive overview of large language model (LLM) research directions within Indic languages. Indic languages are those spoken in the Indian subcontinent, including India, Pakistan, Bangladesh, Sri Lanka, Nepal, and Bhutan, among others. These languages have a rich cultural and linguistic heritage and are spoken by over 1.5 billion people worldwide. With the tremend… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 27 pages, 1 figure

  44. arXiv:2405.17475  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    How Culturally Aware are Vision-Language Models?

    Authors: Olena Burda-Lassen, Aman Chadha, Shashank Goswami, Vinija Jain

    Abstract: An image is often considered worth a thousand words, and certain images can tell rich and insightful stories. Can these stories be told via image captioning? Images from folklore genres, such as mythology, folk dance, cultural signs, and symbols, are vital to every culture. Our research compares the performance of four popular vision-language models (GPT-4V, Gemini Pro Vision, LLaVA, and OpenFlami… ▽ More

    Submitted 8 February, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

  45. arXiv:2405.13019  [pdf, other

    cs.CL cs.AI

    A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models

    Authors: Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman Chadha

    Abstract: Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  46. arXiv:2405.09589  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.SD eess.AS

    A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models

    Authors: Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, Aman Chadha

    Abstract: The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the b… ▽ More

    Submitted 3 October, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: EMNLP 2024 Findings

  47. arXiv:2404.13506  [pdf, other

    cs.LG cs.AI cs.CL

    Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications

    Authors: Charith Chandra Sai Balne, Sreyoshi Bhaduri, Tamoghna Roy, Vinija Jain, Aman Chadha

    Abstract: The rise of deep learning has marked significant progress in fields such as computer vision, natural language processing, and medical imaging, primarily through the adaptation of pre-trained models for specific tasks. Traditional fine-tuning methods, involving adjustments to all parameters, face challenges due to high computational and memory demands. This has led to the development of Parameter E… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  48. arXiv:2404.07214  [pdf, other

    cs.CV cs.AI cs.CL

    Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

    Authors: Akash Ghosh, Arkadeep Acharya, Sriparna Saha, Vinija Jain, Aman Chadha

    Abstract: The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this constraint, researchers have endeavored to integrate visual capabilities with LLMs, resulting in the emergence of Vision-Language Models (VLMs). These advanced… ▽ More

    Submitted 12 April, 2024; v1 submitted 20 February, 2024; originally announced April 2024.

    Comments: The most extensive and up to date Survey on Visual Language Models covering 76 Visual Language Models

  49. arXiv:2403.16422  [pdf, other

    cs.CV cs.AI

    Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

    Authors: Sanyam Lakhanpal, Shivang Chopra, Vinija Jain, Aman Chadha, Man Luo

    Abstract: Over the past few years, Text-to-Image (T2I) generation approaches based on diffusion models have gained significant attention. However, vanilla diffusion models often suffer from spelling inaccuracies in the text displayed within the generated images. The capability to generate visual text is crucial, offering both academic interest and a wide range of practical applications. To produce accurate… ▽ More

    Submitted 28 October, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted at WACV 2025

  50. arXiv:2403.14633  [pdf, other

    cs.CY cs.AI cs.CL

    Born With a Silver Spoon? Investigating Socioeconomic Bias in Large Language Models

    Authors: Smriti Singh, Shuvam Keshari, Vinija Jain, Aman Chadha

    Abstract: Socioeconomic bias in society exacerbates disparities, influencing access to opportunities and resources based on individuals' economic and social backgrounds. This pervasive issue perpetuates systemic inequalities, hindering the pursuit of inclusive progress as a society. In this paper, we investigate the presence of socioeconomic bias, if any, in large language models. To this end, we introduce… ▽ More

    Submitted 19 December, 2024; v1 submitted 16 February, 2024; originally announced March 2024.