Skip to main content

Showing 1–50 of 170 results for author: Verma, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.06261  [pdf, ps, other

    cs.CL cs.AI

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3278 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 72 pages, 17 figures

  2. arXiv:2507.05577  [pdf, ps, other

    cs.IR cs.CL cs.LG

    Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA

    Authors: Shashank Verma, Fengyi Jiang, Xiangning Xue

    Abstract: Biomedical semantic question answering rooted in information retrieval can play a crucial role in keeping up to date with vast, rapidly evolving and ever-growing biomedical literature. A robust system can help researchers, healthcare professionals and even layman users access relevant knowledge grounded in evidence. The BioASQ 2025 Task13b Challenge serves as an important benchmark, offering a com… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Paper submitted to CLEF 2025 CEUR-WS

  3. arXiv:2506.22850  [pdf, ps, other

    cs.CV

    DMD-Net: Deep Mesh Denoising Network

    Authors: Aalok Gangopadhyay, Shashikant Verma, Shanmuganathan Raman

    Abstract: We present Deep Mesh Denoising Network (DMD-Net), an end-to-end deep learning framework, for solving the mesh denoising problem. DMD-Net consists of a Graph Convolutional Neural Network in which aggregation is performed in both the primal as well as the dual graph. This is realized in the form of an asymmetric two-stream network, which contains a primal-dual fusion block that enables communication… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  4. arXiv:2506.22833  [pdf, ps, other

    cs.CV

    SemFaceEdit: Semantic Face Editing on Generative Radiance Manifolds

    Authors: Shashikant Verma, Shanmuganathan Raman

    Abstract: Despite multiple view consistency offered by 3D-aware GAN techniques, the resulting images often lack the capacity for localized editing. In response, generative radiance manifolds emerge as an efficient approach for constrained point sampling within volumes, effectively reducing computational demands and enabling the learning of fine details. This work introduces SemFaceEdit, a novel method that… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  5. An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify

    Authors: Shivam Verma, Vivian Chen, Darren Mei

    Abstract: Spotify, a large-scale multimedia platform, attracts over 675 million monthly active users who collectively consume millions of hours of music, podcasts, audiobooks, and video content. This diverse content consumption pattern introduces unique challenges for computational advertising, which must effectively integrate a variety of ad modalities, including audio, video, and display, within a single… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted at KDD 2025

    ACM Class: H.3.3; I.2.1; I.2.6

  6. arXiv:2506.00671  [pdf, ps, other

    cs.CL

    DeepRAG: Integrating Hierarchical Reasoning and Process Supervision for Biomedical Multi-Hop QA

    Authors: Yuelyu Ji, Hang Zhang, Shiven Verma, Hui Ji, Chun Li, Yushui Han, Yanshan Wang

    Abstract: We propose DeepRAG, a novel framework that integrates DeepSeek hierarchical question decomposition capabilities with RAG Gym unified retrieval-augmented generation optimization using process level supervision. Targeting the challenging MedHopQA biomedical question answering task, DeepRAG systematically decomposes complex queries into precise sub-queries and employs concept level reward signals inf… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  7. arXiv:2505.23856  [pdf, ps, other

    cs.CL cs.AI cs.HC cs.LG

    OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities

    Authors: Sahil Verma, Keegan Hines, Jeff Bilmes, Charlotte Siska, Luke Zettlemoyer, Hila Gonen, Chandan Singh

    Abstract: The emerging capabilities of large language models (LLMs) have sparked concerns about their immediate potential for harmful misuse. The core approach to mitigate these concerns is the detection of harmful queries to the model. Current detection approaches are fallible, and are particularly susceptible to attacks that exploit mismatched generalization of model capabilities (e.g., prompts in low-res… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  8. arXiv:2505.09353  [pdf, other

    cs.FL

    Deterministic Suffix-reading Automata

    Authors: R Keerthan, B Srivathsan, R Venkatesh, Sagar Verma

    Abstract: We introduce deterministic suffix-reading automata (DSA), a new automaton model over finite words. Transitions in a DSA are labeled with words. From a state, a DSA triggers an outgoing transition on seeing a word ending with the transition's label. Therefore, rather than moving along an input word letter by letter, a DSA can jump along blocks of letters, with each block ending in a suitable suffix… ▽ More

    Submitted 19 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

    Comments: Extended version of arXiv:2410.22761

    MSC Class: 68Q45 ACM Class: F.1.1

  9. arXiv:2505.02124  [pdf, other

    cs.LG

    GRAIL: Graph Edit Distance and Node Alignment Using LLM-Generated Code

    Authors: Samidha Verma, Arushi Goyal, Ananya Mathur, Ankit Anand, Sayan Ranu

    Abstract: Graph Edit Distance (GED) is a widely used metric for measuring similarity between two graphs. Computing the optimal GED is NP-hard, leading to the development of various neural and non-neural heuristics. While neural methods have achieved improved approximation quality compared to non-neural approaches, they face significant challenges: (1) They require large amounts of ground truth data, which i… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  10. arXiv:2505.01475  [pdf, other

    cs.SE cs.AI

    CodeSSM: Towards State Space Models for Code Understanding

    Authors: Shweta Verma, Abhinav Anand, Mira Mezini

    Abstract: Although transformers are widely used for various code-specific tasks, they have some significant limitations. In this paper, we investigate State Space Models (SSMs) as a potential alternative to transformers for code understanding tasks, such as code retrieval, classification, and clone detection. Previous research has already demonstrated that SSMs are more compute-efficient than transformers.… ▽ More

    Submitted 21 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

  11. arXiv:2505.00204  [pdf, other

    cs.AI

    RAIL in the Wild: Operationalizing Responsible AI Evaluation Using Anthropic's Value Dataset

    Authors: Sumit Verma, Pritam Prasun, Arpit Jaiswal, Pritish Kumar

    Abstract: As AI systems become embedded in real-world applications, ensuring they meet ethical standards is crucial. While existing AI ethics frameworks emphasize fairness, transparency, and accountability, they often lack actionable evaluation methods. This paper introduces a systematic approach using the Responsible AI Labs (RAIL) framework, which includes eight measurable dimensions to assess the normati… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  12. arXiv:2504.07463  [pdf, ps, other

    cs.AI

    Enhanced Question-Answering for Skill-based learning using Knowledge-based AI and Generative AI

    Authors: Rahul K. Dass, Rochan H. Madhusudhana, Erin C. Deye, Shashank Verma, Timothy A. Bydlon, Grace Brazil, Ashok K. Goel

    Abstract: Supporting learners' understanding of taught skills in online settings is a longstanding challenge. While exercises and chat-based agents can evaluate understanding in limited contexts, this challenge is magnified when learners seek explanations that delve into procedural knowledge (how things are done) and reasoning (why things happen). We hypothesize that an intelligent agent's ability to unders… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  13. arXiv:2504.06581  [pdf, other

    cs.AI

    Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis

    Authors: Umakanta Maharana, Sarthak Verma, Avarna Agarwal, Prakashini Mruthyunjaya, Dwarikanath Mahapatra, Sakir Ahmed, Murari Mandal

    Abstract: Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. The early diagnosis of various diseases continues to be a significant challenge in healthcare, primarily due to the nonspecific nature of early symptoms, the shortage of expert medical practitioners, and the need for prolonge… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  14. arXiv:2503.23989  [pdf, ps, other

    cs.SE cs.AI

    Rubric Is All You Need: Enhancing LLM-based Code Evaluation With Question-Specific Rubrics

    Authors: Aditya Pathak, Rachit Gandhi, Vaibhav Uttam, Devansh, Yashwanth Nakka, Aaryan Raj Jindal, Pratyush Ghosh, Arnav Ramamoorthy, Shreyash Verma, Aditya Mittal, Aashna Ased, Chirag Khatri, Jagat Sesh Challa, Dhruv Kumar

    Abstract: Since the emergence of Large Language Models (LLMs) popularized by the release of GPT-3 and ChatGPT, LLMs have shown remarkable promise in programming-related tasks. While code generation using LLMs has become a popular field of research, code evaluation using LLMs remains under-explored. In this paper, we focus on LLM-based code evaluation and attempt to fill in the existing gaps. We propose mult… ▽ More

    Submitted 22 June, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

    Comments: Accepted in ICER 2025

  15. arXiv:2503.21674  [pdf, other

    cs.CR cs.AI cs.NI

    Intelligent IoT Attack Detection Design via ODLLM with Feature Ranking-based Knowledge Base

    Authors: Satvik Verma, Qun Wang, E. Wes Bethel

    Abstract: The widespread adoption of Internet of Things (IoT) devices has introduced significant cybersecurity challenges, particularly with the increasing frequency and sophistication of Distributed Denial of Service (DDoS) attacks. Traditional machine learning (ML) techniques often fall short in detecting such attacks due to the complexity of blended and evolving patterns. To address this, we propose a no… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  16. arXiv:2503.13344  [pdf, other

    cs.CV

    STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans

    Authors: Shashikant Verma, Harish Katti, Soumyaratna Debnath, Yamuna Swamy, Shanmuganathan Raman

    Abstract: We introduce STEP, a novel framework utilizing Transformer-based discriminative model prediction for simultaneous tracking and estimation of pose across diverse animal species and humans. We are inspired by the fact that the human brain exploits spatiotemporal continuity and performs concurrent localization and pose estimation despite the specialization of brain areas for form and motion processin… ▽ More

    Submitted 20 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  17. arXiv:2503.13112  [pdf, other

    math.CO cs.DS

    Connected Partitions via Connected Dominating Sets

    Authors: Aikaterini Niklanovits, Kirill Simonov, Shaily Verma, Ziena Zeif

    Abstract: The classical theorem due to Győri and Lovász states that any $k$-connected graph $G$ admits a partition into $k$ connected subgraphs, where each subgraph has a prescribed size and contains a prescribed vertex, as long as the total size of target subgraphs is equal to the size of $G$. However, this result is notoriously evasive in terms of efficient constructions, and it is still unknown whether s… ▽ More

    Submitted 15 May, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  18. arXiv:2503.08290  [pdf, other

    cs.CV

    SegDesicNet: Lightweight Semantic Segmentation in Remote Sensing with Geo-Coordinate Embeddings for Domain Adaptation

    Authors: Sachin Verma, Frank Lindseth, Gabriel Kiss

    Abstract: Semantic segmentation is essential for analyzing highdefinition remote sensing images (HRSIs) because it allows the precise classification of objects and regions at the pixel level. However, remote sensing data present challenges owing to geographical location, weather, and environmental variations, making it difficult for semantic segmentation models to generalize across diverse scenarios. Existi… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: https://openaccess.thecvf.com/content/WACV2025/papers/Verma_SegDesicNet_Lightweight_Semantic_Segmentation_in_Remote_Sensing_with_Geo-Coordinate_Embeddings_WACV_2025_paper.pdf

    Journal ref: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025

  19. arXiv:2502.11951  [pdf, other

    cs.CE cs.LG quant-ph

    Qubit-Based Framework for Quantum Machine Learning: Bridging Classical Data and Quantum Algorithms

    Authors: Bhavna Bose, Saurav Verma

    Abstract: This paper dives into the exciting and rapidly growing field of quantum computing, explaining its core ideas, current progress, and how it could revolutionize the way we solve complex problems. It starts by breaking down the basics, like qubits, quantum circuits, and how principles like superposition and entanglement make quantum computers fundamentally different-and far more powerful for certain… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  20. arXiv:2502.09724  [pdf, other

    cs.LG

    Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning

    Authors: Cheol Woo Kim, Jai Moondra, Shresth Verma, Madeleine Pollack, Lingkai Kong, Milind Tambe, Swati Gupta

    Abstract: In many real-world applications of reinforcement learning (RL), deployed policies have varied impacts on different stakeholders, creating challenges in reaching consensus on how to effectively aggregate their preferences. Generalized $p$-means form a widely used class of social welfare functions for this purpose, with broad applications in fair resource allocation, AI alignment, and decision-makin… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  21. arXiv:2502.06048  [pdf, other

    cs.CC

    A Parameterized Study of Secluded Structures in Directed Graphs

    Authors: Nadym Mallek, Jonas Schmidt, Shaily Verma

    Abstract: Given an undirected graph $G$ and an integer $k$, the Secluded $Π$-Subgraph problem asks you to find a maximum size induced subgraph that satisfies a property $Π$ and has at most $k$ neighbors in the rest of the graph. This problem has been extensively studied; however, there is no prior study of the problem in directed graphs. This question has been mentioned by Jansen et al. [ISAAC'23]. In thi… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  22. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  23. arXiv:2501.10784  [pdf, other

    cs.LG

    Measuring Fairness in Financial Transaction Machine Learning Models

    Authors: Deniz Sezin Ayvaz, Lorenzo Belenguer, Hankun He, Deborah Dormah Kanubala, Mingxu Li, Soung Low, Carlos Mougan, Faithful Chiagoziem Onwuegbuche, Yulu Pi, Natalia Sikora, Dan Tran, Shresth Verma, Hanzhi Wang, Skyler Xie, Adeline Pelletier

    Abstract: Mastercard, a global leader in financial services, develops and deploys machine learning models aimed at optimizing card usage and preventing attrition through advanced predictive models. These models use aggregated and anonymized card usage patterns, including cross-border transactions and industry-specific spending, to tailor bank offerings and maximize revenue opportunities. Mastercard has esta… ▽ More

    Submitted 22 January, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

    Comments: Mastercard Data Study Group Alan Turing Institute: https://www.turing.ac.uk/news/publications/data-study-group-final-report-mastercard

  24. arXiv:2501.01174  [pdf, other

    cs.CV cs.AI

    L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild

    Authors: Soumyaratna Debnath, Harish Katti, Shashikant Verma, Shanmuganathan Raman

    Abstract: While 2D pose estimation has advanced our ability to interpret body movements in animals and primates, it is limited by the lack of depth information, constraining its application range. 3D pose estimation provides a more comprehensive solution by incorporating spatial depth, yet creating extensive 3D pose datasets for animals is challenging due to their dynamic and unpredictable behaviours in nat… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

  25. arXiv:2412.17684  [pdf, other

    cs.LG

    COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation

    Authors: Arnav M. Das, Gantavya Bhatt, Lilly Kumari, Sahil Verma, Jeff Bilmes

    Abstract: Retrieval augmentation, the practice of retrieving additional data from large auxiliary pools, has emerged as an effective technique for enhancing model performance in the low-data regime. Prior approaches have employed only nearest-neighbor based strategies for data selection, which retrieve auxiliary samples with high similarity to instances in the target task. However, these approaches are pron… ▽ More

    Submitted 28 March, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: Accepted at CVPR 2025

  26. arXiv:2412.05649  [pdf, other

    cs.NI

    RouteNet-Fermi: Network Modeling With GNN (Analysis And Re-implementation)

    Authors: Shourya Verma, Simran Kadadi, Swathi Jayaprakash, Arpan Kumar Mahapatra, Ishaan Jain

    Abstract: Network performance modeling presents important challenges in modern computer networks due to increasing complexity, scale, and diverse traffic patterns. While traditional approaches like queuing theory and packet-level simulation have served as foundational tools, they face limitations in modeling complex traffic behaviors and scaling to large networks. This project presents an extended implement… ▽ More

    Submitted 7 December, 2024; originally announced December 2024.

  27. arXiv:2412.01935  [pdf, other

    cs.LG cs.AI

    Cross Domain Adaptation using Adversarial networks with Cyclic loss

    Authors: Manpreet Kaur, Ankur Tomar, Srijan Mishra, Shashwat Verma

    Abstract: Deep Learning methods are highly local and sensitive to the domain of data they are trained with. Even a slight deviation from the domain distribution affects prediction accuracy of deep networks significantly. In this work, we have investigated a set of techniques aimed at increasing accuracy of generator networks which perform translation from one domain to the other in an adversarial setting. I… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: 16 pages, 14 figures

  28. arXiv:2411.02538  [pdf, other

    cs.CL

    MILU: A Multi-task Indic Language Understanding Benchmark

    Authors: Sshubam Verma, Mohammed Safi Ur Rahman Khan, Vishwajeet Kumar, Rudra Murthy, Jaydeep Sen

    Abstract: Evaluating Large Language Models (LLMs) in low-resource and linguistically diverse languages remains a significant challenge in NLP, particularly for languages using non-Latin scripts like those spoken in India. Existing benchmarks predominantly focus on English, leaving substantial gaps in assessing LLM capabilities in these languages. We introduce MILU, a Multi task Indic Language Understanding… ▽ More

    Submitted 4 February, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

  29. arXiv:2411.00780  [pdf, other

    cs.IR

    Proactive Detection and Calibration of Seasonal Advertisements with Multimodal Large Language Models

    Authors: Hamid Eghbalzadeh, Shuai Shao, Saurabh Verma, Venugopal Mani, Hongnan Wang, Jigar Madia, Vitali Karpinchyk, Andrey Malevich

    Abstract: A myriad of factors affect large scale ads delivery systems and influence both user experience and revenue. One such factor is proactive detection and calibration of seasonal advertisements to help with increasing conversion and user satisfaction. In this paper, we present Proactive Detection and Calibration of Seasonal Advertisements (PDCaSA), a research problem that is of interest for the ads ra… ▽ More

    Submitted 16 October, 2024; originally announced November 2024.

  30. Deterministic Suffix-reading Automata

    Authors: R Keerthan, B Srivathsan, R Venkatesh, Sagar Verma

    Abstract: We introduce deterministic suffix-reading automata (DSA), a new automaton model over finite words. Transitions in a DSA are labeled with words. From a state, a DSA triggers an outgoing transition on seeing a word ending with the transition's label. Therefore, rather than moving along an input word letter by letter, a DSA can jump along blocks of letters, with each block ending in a suitable suffix… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: In Proceedings GandALF 2024, arXiv:2410.21884

    ACM Class: F.1.1

    Journal ref: EPTCS 409, 2024, pp. 70-87

  31. arXiv:2410.20629  [pdf, other

    cs.DM cs.DS

    Parameterized Saga of First-Fit and Last-Fit Coloring

    Authors: Akanksha Agrawal, Daniel Lokshtanov, Fahad Panolan, Saket Saurabh, Shaily Verma

    Abstract: The classic greedy coloring (first-fit) algorithm considers the vertices of an input graph $G$ in a given order and assigns the first available color to each vertex $v$ in $G$. In the {\sc Grundy Coloring} problem, the task is to find an ordering of the vertices that will force the greedy algorithm to use as many colors as possible. In the {\sc Partial Grundy Coloring}, the task is also to color t… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

  32. arXiv:2410.15002  [pdf, other

    cs.CV

    How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold

    Authors: Sahil Verma, Royi Rassin, Arnav Das, Gantavya Bhatt, Preethi Seshadri, Chirag Shah, Jeff Bilmes, Hannaneh Hajishirzi, Yanai Elazar

    Abstract: Text-to-image models are trained using large datasets collected by scraping image-text pairs from the internet. These datasets often include private, copyrighted, and licensed material. Training models on such datasets enables them to generate images with such content, which might violate copyright laws and individual privacy. This phenomenon is termed imitation -- generation of images with conten… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: Accepted at ATTRIB, RegML, and SafeGenAI workshops at NeurIPS 2024 and NLLP Workshop 2024

  33. arXiv:2410.14702  [pdf, other

    cs.AI cs.CL

    Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

    Authors: Himanshu Gupta, Shreyas Verma, Ujjwala Anantheswaran, Kevin Scaria, Mihir Parmar, Swaroop Mishra, Chitta Baral

    Abstract: Multi-modal Large Language Models (MLLMs) exhibit impressive problem-solving abilities in various domains, but their visual comprehension and abstract reasoning skills remain under-evaluated. To this end, we present PolyMATH, a challenging benchmark aimed at evaluating the general cognitive reasoning abilities of MLLMs. PolyMATH comprises 5,000 manually collected high-quality images of cognitive t… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

    Comments: 49 pages, (10 pages paper, 9 pages references, 30 pages appendix)

  34. arXiv:2409.16984  [pdf, other

    cs.AI cs.CL

    AXCEL: Automated eXplainable Consistency Evaluation using LLMs

    Authors: P Aditya Sreekar, Sahil Verma, Suransh Chopra, Sarik Ghazarian, Abhishek Persad, Narayanan Sadagopan

    Abstract: Large Language Models (LLMs) are widely used in both industry and academia for various tasks, yet evaluating the consistency of generated text responses continues to be a challenge. Traditional metrics like ROUGE and BLEU show a weak correlation with human judgment. More sophisticated metrics using Natural Language Inference (NLI) have shown improved correlations but are complex to implement, requ… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  35. arXiv:2409.13406  [pdf, other

    cs.LG

    Credit Card Fraud Detection: A Deep Learning Approach

    Authors: Sourav Verma, Joydip Dhar

    Abstract: Credit card is one of the most extensive methods of instalment for both online and offline mode of payment for electronic transactions in recent times. credit cards invention has provided significant ease in electronic transactions. However, it has also provided new fraud opportunities for criminals, which results in increased fraud rates. Substantial amount of money has been lost by many institut… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Part of the M.Tech. thesis. Sourav Verma, ABV-Indian Institute of Information Technology, Gwalior 2013-18

  36. arXiv:2409.13385  [pdf, other

    cs.CL cs.IR

    Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey

    Authors: Sourav Verma

    Abstract: Large Language Models (LLMs) showcase remarkable abilities, yet they struggle with limitations such as hallucinations, outdated knowledge, opacity, and inexplicable reasoning. To address these challenges, Retrieval-Augmented Generation (RAG) has proven to be a viable solution, leveraging external databases to improve the consistency and coherence of generated content, especially valuable for compl… ▽ More

    Submitted 2 October, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: Ongoing Work

  37. Li-MSD: A lightweight mitigation solution for DAO insider attack in RPL-based IoT

    Authors: Abhishek Verma, Sachin Kumar Verma, Avinash Chandra Pandey, Jyoti Grover, Girish Sharma

    Abstract: Many IoT applications run on a wireless infrastructure supported by resource-constrained nodes which is popularly known as Low-Power and Lossy Networks (LLNs). Currently, LLNs play a vital role in digital transformation of industries. The resource limitations of LLNs restrict the usage of traditional routing protocols and therefore require an energy-efficient routing solution. IETF's Routing Proto… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Journal ref: Future Generation Computer Systems, 159, 327-339 (2024)

  38. arXiv:2409.06122  [pdf, other

    cs.AI cs.SE

    Case Study: Leveraging GenAI to Build AI-based Surrogates and Regressors for Modeling Radio Frequency Heating in Fusion Energy Science

    Authors: E. Wes Bethel, Vianna Cramer, Alexander del Rio, Lothar Narins, Chris Pestano, Satvik Verma, Erick Arias, Nicola Bertelli, Talita Perciano, Syun'ichi Shiraiwa, Álvaro Sánchez Villar, Greg Wallace, John C. Wright

    Abstract: This work presents a detailed case study on using Generative AI (GenAI) to develop AI surrogates for simulation models in fusion energy research. The scope includes the methodology, implementation, and results of using GenAI to assist in model development and optimization, comparing these results with previous manually developed models.

    Submitted 9 September, 2024; originally announced September 2024.

    Report number: LBNL-2001609

  39. arXiv:2408.12112  [pdf, ps, other

    cs.LG cs.AI cs.MA

    Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

    Authors: Shresth Verma, Niclas Boehmer, Lingkai Kong, Milind Tambe

    Abstract: LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the prese… ▽ More

    Submitted 7 July, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

  40. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere , et al. (536 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 23 November, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  41. arXiv:2407.12131  [pdf, other

    cs.CY cs.AI cs.LG cs.MA

    Improving Health Information Access in the World's Largest Maternal Mobile Health Program via Bandit Algorithms

    Authors: Arshika Lalan, Shresth Verma, Paula Rodriguez Diaz, Panayiotis Danassis, Amrita Mahale, Kumar Madhu Sudan, Aparna Hegde, Milind Tambe, Aparna Taneja

    Abstract: Harnessing the wide-spread availability of cell phones, many nonprofits have launched mobile health (mHealth) programs to deliver information via voice or text to beneficiaries in underserved communities, with maternal and infant health being a key area of such mHealth programs. Unfortunately, dwindling listenership is a major challenge, requiring targeted interventions using limited resources. Th… ▽ More

    Submitted 14 May, 2024; originally announced July 2024.

    Comments: Published at Innovative Applications of Artificial Intelligence (IAAI 2024)

  42. arXiv:2407.08003  [pdf, other

    cs.LG cs.AI

    Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data

    Authors: Ritesh Mehta, Aleksandar Pramov, Shashank Verma

    Abstract: Amyotrophic Lateral Sclerosis (ALS) is characterized as a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options in the realm of medical interventions and therapies. The disease showcases a diverse range of onset patterns and progression trajectories, emphasizing the critical importance of early detection of functional decline to enable tailored care… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Paper submitted to CLEF 2024 CEUR-WS

  43. arXiv:2407.05255  [pdf, other

    cs.CV

    Estimation of the Area and Precipitation Associated with a Tropical Cyclone Biparjoy by using Image Processing

    Authors: Shikha Verma, Kuldeep Srivastava, Akhilesh Tiwari, Shekhar Verma

    Abstract: The rainfall associated with Topical Cyclone(TC) contributes a major amount to the annual rainfall in India. Due to the limited research on the quantitative precipitation associated with Tropical Cyclones (TC), the prediction of the amount of precipitation and area that it may cover remains a challenge. This paper proposes an approach to estimate the accumulated precipitation and impact on affecte… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  44. arXiv:2406.15444  [pdf, other

    cs.CL

    Cutting Through the Noise: Boosting LLM Performance on Math Word Problems

    Authors: Ujjwala Anantheswaran, Himanshu Gupta, Kevin Scaria, Shreyas Verma, Chitta Baral, Swaroop Mishra

    Abstract: Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables. We introduce a dataset, PROBLEMATHIC, containing both adversarial and non-adversarial MWPs. Our experim… ▽ More

    Submitted 24 October, 2024; v1 submitted 30 May, 2024; originally announced June 2024.

  45. arXiv:2406.13439  [pdf, other

    cs.CL

    Finding Blind Spots in Evaluator LLMs with Interpretable Checklists

    Authors: Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Sshubam Verma, Mitesh M. Khapra

    Abstract: Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework d… ▽ More

    Submitted 26 November, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: EMNLP 2024

  46. arXiv:2406.11930  [pdf, other

    cs.SE cs.AI cs.CL

    A Critical Study of What Code-LLMs (Do Not) Learn

    Authors: Abhinav Anand, Shweta Verma, Krishna Narasimhan, Mira Mezini

    Abstract: Large Language Models trained on code corpora (code-LLMs) have demonstrated impressive performance in various coding assistance tasks. However, despite their increased size and training dataset, code-LLMs still have limitations such as suggesting codes with syntactic errors, variable misuse etc. Some studies argue that code-LLMs perform well on coding tasks because they use self-attention and hidd… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  47. arXiv:2405.16681  [pdf, other

    cs.CL

    Triple Preference Optimization: Achieving Better Alignment using a Single Step Optimization

    Authors: Amir Saeidi, Shivanshu Verma, Aswin RRV, Kashif Rasul, Chitta Baral

    Abstract: Reinforcement Learning with Human Feedback (RLHF) enhances the alignment of Large Language Models (LLMs). However, its limitations have led to the development of Direct Preference Optimization (DPO), an RL-free approach designed to overcome these shortcomings. While studies have shown that DPO improves instruction-following capabilities, it negatively impacts the reasoning ability of LLMs. Additio… ▽ More

    Submitted 17 February, 2025; v1 submitted 26 May, 2024; originally announced May 2024.

  48. arXiv:2405.12299  [pdf, other

    cs.LG cs.AI cs.CV

    Perturbing the Gradient for Alleviating Meta Overfitting

    Authors: Manas Gogoi, Sambhavi Tiwari, Shekhar Verma

    Abstract: The reason for Meta Overfitting can be attributed to two factors: Mutual Non-exclusivity and the Lack of diversity, consequent to which a single global function can fit the support set data of all the meta-training tasks and fail to generalize to new unseen tasks. This issue is evidenced by low error rates on the meta-training tasks, but high error rates on new tasks. However, there can be a numbe… ▽ More

    Submitted 10 November, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  49. arXiv:2404.14723  [pdf, other

    cs.CL

    Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks

    Authors: Amir Saeidi, Shivanshu Verma, Md Nayem Uddin, Chitta Baral

    Abstract: This study evaluates Direct Preference Optimization (DPO) and its variants for aligning Large Language Models (LLMs) with human preferences, testing three configurations: (1) with Supervised Fine Tuning (SFT), (2) without SFT, and (3) without SFT but using an instruction tuned model. We further investigate how training set size influences model performance. Our evaluation spans 13 benchmarks cover… ▽ More

    Submitted 8 February, 2025; v1 submitted 22 April, 2024; originally announced April 2024.

  50. arXiv:2402.13135  [pdf

    cs.DC

    A Systematic Literature Review on Task Allocation and Performance Management Techniques in Cloud Data Center

    Authors: Nidhika Chauhan, Navneet Kaur, Kamaljit Singh Saini, Sahil Verma, Abdulatif Alabdulatif, Ruba Abu Khurma, Maribel Garcia-Arenas, Pedro A. Castillo

    Abstract: As cloud computing usage grows, cloud data centers play an increasingly important role. To maximize resource utilization, ensure service quality, and enhance system performance, it is crucial to allocate tasks and manage performance effectively. The purpose of this study is to provide an extensive analysis of task allocation and performance management techniques employed in cloud data centers. The… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.