Skip to main content

Showing 1–50 of 196 results for author: Srivastava, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.02278  [pdf, other

    cs.CV

    Compositional Image-Text Matching and Retrieval by Grounding Entities

    Authors: Madhukar Reddy Vongala, Saurabh Srivastava, Jana Košecká

    Abstract: Vision-language pretraining on large datasets of images-text pairs is one of the main building blocks of current Vision-Language Models. While with additional training, these models excel in various downstream tasks, including visual question answering, image captioning, and visual commonsense reasoning. However, a notable weakness of pretrained models like CLIP, is their inability to perform enti… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: Accepted at CVPR-W

  2. arXiv:2505.02222  [pdf, other

    cs.LG stat.ML

    Practical Efficiency of Muon for Pretraining

    Authors: Essential AI, :, Ishaan Shah, Anthony M. Polloreno, Karl Stratos, Philip Monk, Adarsh Chaluvaraju, Andrew Hojel, Andrew Ma, Anil Thomas, Ashish Tanwer, Darsh J Shah, Khoi Nguyen, Kurt Smith, Michael Callahan, Michael Pust, Mohit Parmar, Peter Rushton, Platon Mazarakis, Ritvik Kapila, Saurabh Srivastava, Somanshu Singla, Tim Romanski, Yash Vanjani, Ashish Vaswani

    Abstract: We demonstrate that Muon, the simplest instantiation of a second-order optimizer, explicitly expands the Pareto frontier over AdamW on the compute-time tradeoff. We find that Muon is more effective than AdamW in retaining data efficiency at large batch sizes, far beyond the so-called critical batch size, while remaining computationally efficient, thus enabling more economical training. We study th… ▽ More

    Submitted 10 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

  3. arXiv:2504.20333  [pdf, other

    cs.DS cs.CC cs.IT

    List Decoding Expander-Based Codes up to Capacity in Near-Linear Time

    Authors: Shashank Srivastava, Madhur Tulsiani

    Abstract: We give a new framework based on graph regularity lemmas, for list decoding and list recovery of codes based on spectral expanders. Using existing algorithms for computing regularity decompositions of sparse graphs in (randomized) near-linear time, and appropriate choices for the constant-sized inner/base codes, we prove the following: - Expander-based codes constructed using the distance amplif… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  4. arXiv:2504.07357  [pdf, ps, other

    cs.CL

    Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction

    Authors: Saurabh Srivastava, Ziyu Yao

    Abstract: Large Reasoning Models (LRMs) such as DeepSeek-R1 and OpenAI o1 have demonstrated remarkable capabilities in various reasoning tasks. Their strong capability to generate and reason over intermediate thoughts has also led to arguments that they may no longer require extensive prompt engineering or optimization to interpret human instructions and produce accurate outputs. In this work, we aim to sys… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  5. arXiv:2504.04022  [pdf, other

    cs.CL cs.AI

    Rethinking Reflection in Pre-Training

    Authors: Essential AI, :, Darsh J Shah, Peter Rushton, Somanshu Singla, Mohit Parmar, Kurt Smith, Yash Vanjani, Ashish Vaswani, Adarsh Chaluvaraju, Andrew Hojel, Andrew Ma, Anil Thomas, Anthony Polloreno, Ashish Tanwer, Burhan Drak Sibai, Divya S Mansingka, Divya Shivaprasad, Ishaan Shah, Karl Stratos, Khoi Nguyen, Michael Callahan, Michael Pust, Mrinal Iyer, Philip Monk , et al. (4 additional authors not shown)

    Abstract: A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model c… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  6. arXiv:2502.18848  [pdf, other

    cs.CL cs.AI cs.LG stat.ME

    A Causal Lens for Evaluating Faithfulness Metrics

    Authors: Kerem Zaman, Shashank Srivastava

    Abstract: Large Language Models (LLMs) offer natural language explanations as an alternative to feature attribution methods for model interpretability. However, despite their plausibility, they may not reflect the model's internal reasoning faithfully, which is crucial for understanding the model's true decision-making processes. Although several faithfulness metrics have been proposed, a unified evaluation… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 18 pages, 18 figures, 6 tables

  7. arXiv:2502.18710  [pdf, other

    q-bio.NC cs.AI

    Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts

    Authors: Chaitanya Kapoor, Sudhanshu Srivastava, Meenakshi Khosla

    Abstract: Understanding convergent learning -- the extent to which artificial and biological neural networks develop similar representations -- is crucial for neuroscience and AI, as it reveals shared learning principles and guides brain-like model design. While several studies have noted convergence in early and late layers of vision networks, key gaps remain. First, much existing work relies on a limited… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  8. arXiv:2502.16377  [pdf, other

    cs.CL

    Instruction-Tuning LLMs for Event Extraction with Annotation Guidelines

    Authors: Saurabh Srivastava, Sweta Pati, Ziyu Yao

    Abstract: In this work, we study the effect of annotation guidelines -- textual descriptions of event types and arguments, when instruction-tuning large language models for event extraction. We conducted a series of experiments with both human-provided and machine-generated guidelines in both full- and low-data settings. Our results demonstrate the promise of annotation guidelines when there is a decent amo… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  9. arXiv:2502.16255  [pdf

    eess.SP cs.AI cs.LG

    rECGnition_v2.0: Self-Attentive Canonical Fusion of ECG and Patient Data using deep learning for effective Cardiac Diagnostics

    Authors: Shreya Srivastava, Durgesh Kumar, Ram Jiwari, Sandeep Seth, Deepak Sharma

    Abstract: The variability in ECG readings influenced by individual patient characteristics has posed a considerable challenge to adopting automated ECG analysis in clinical settings. A novel feature fusion technique termed SACC (Self Attentive Canonical Correlation) was proposed to address this. This technique is combined with DPN (Dual Pathway Network) and depth-wise separable convolution to create a robus… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  10. arXiv:2502.07308  [pdf, other

    cs.IT cs.CC

    Explicit Codes approaching Generalized Singleton Bound using Expanders

    Authors: Fernando Granha Jeronimo, Tushant Mittal, Shashank Srivastava, Madhur Tulsiani

    Abstract: We construct a new family of explicit codes that are list decodable to capacity and achieve an optimal list size of $O(\frac{1}ε)$. In contrast to existing explicit constructions of codes achieving list decoding capacity, our arguments do not rely on algebraic structure but utilize simple combinatorial properties of expander graphs. Our construction is based on a celebrated distance amplificatio… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: STOC 2025

  11. arXiv:2501.06348  [pdf, other

    cs.HC cs.RO

    Why Automate This? Exploring the Connection between Time Use, Well-being and Robot Automation Across Social Groups

    Authors: Ruchira Ray, Leona Pang, Sanjana Srivastava, Li Fei-Fei, Samantha Shorey, Roberto Martín-Martín

    Abstract: Understanding the motivations underlying the human inclination to automate tasks is vital to developing truly helpful robots integrated into daily life. Accordingly, we ask: are individuals more inclined to automate chores based on the time they consume or the feelings experienced while performing them? This study explores these preferences and whether they vary across different social groups (i.e… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

    Comments: 20 pages, 14 figures

  12. arXiv:2412.16395  [pdf, other

    cs.AI

    Autonomous Option Invention for Continual Hierarchical Reinforcement Learning and Planning

    Authors: Rashmeet Kaur Nayyar, Siddharth Srivastava

    Abstract: Abstraction is key to scaling up reinforcement learning (RL). However, autonomously learning abstract state and action representations to enable transfer and generalization remains a challenging open problem. This paper presents a novel approach for inventing, representing, and utilizing options, which represent temporally extended behaviors, in continual RL settings. Our approach addresses stream… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  13. arXiv:2412.11388  [pdf, other

    cs.CL

    INTERACT: Enabling Interactive, Question-Driven Learning in Large Language Models

    Authors: Aum Kendapadi, Kerem Zaman, Rakesh R. Menon, Shashank Srivastava

    Abstract: Large language models (LLMs) excel at answering questions but remain passive learners--absorbing static data without the ability to question and refine knowledge. This paper explores how LLMs can transition to interactive, question-driven learning through student-teacher dialogues. We introduce INTERACT (INTEReractive Learning for Adaptive Concept Transfer), a framework in which a "student" LLM en… ▽ More

    Submitted 15 December, 2024; originally announced December 2024.

    Comments: 30 pages, 8 figures, 14 tables

  14. arXiv:2412.05528  [pdf

    cs.AI

    AI Planning: A Primer and Survey (Preliminary Report)

    Authors: Dillon Z. Chen, Pulkit Verma, Siddharth Srivastava, Michael Katz, Sylvie Thiébaux

    Abstract: Automated decision-making is a fundamental topic that spans multiple sub-disciplines in AI: reinforcement learning (RL), AI planning (AP), foundation models, and operations research, among others. Despite recent efforts to ``bridge the gaps'' between these communities, there remain many insights that have not yet transcended the boundaries. Our goal in this paper is to provide a brief and non-exha… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  15. arXiv:2411.17957  [pdf, other

    cs.CV

    Optimization-Free Image Immunization Against Diffusion-Based Editing

    Authors: Tarik Can Ozden, Ozgur Kara, Oguzhan Akcin, Kerem Zaman, Shashank Srivastava, Sandeep P. Chinchali, James M. Rehg

    Abstract: Current image immunization defense techniques against diffusion-based editing embed imperceptible noise in target images to disrupt editing models. However, these methods face scalability challenges, as they require time-consuming re-optimization for each image-taking hours for small batches. To address these challenges, we introduce DiffVax, a scalable, lightweight, and optimization-free framewor… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Project webpage: https://diffvax.github.io/

  16. arXiv:2411.14633  [pdf, other

    q-bio.NC cs.AI cs.CV

    Evaluating Representational Similarity Measures from the Lens of Functional Correspondence

    Authors: Yiqing Bo, Ansh Soni, Sudhanshu Srivastava, Meenakshi Khosla

    Abstract: Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data, where the comparative analysis of such data is crucial for revealing shared mechanisms and differences between these complex systems. Despite the widespread use of representational comparisons and the abundance classes of comparison methods, a critical question remains: which metrics… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  17. arXiv:2411.13903  [pdf

    eess.SP cs.AI cs.LG

    AmpliNetECG12: A lightweight SoftMax-based relativistic amplitude amplification architecture for 12 lead ECG classification

    Authors: Shreya Srivastava

    Abstract: The urgent need to promptly detect cardiac disorders from 12-lead Electrocardiograms using limited computations is motivated by the heart's fast and complex electrical activity and restricted computational power of portable devices. Timely and precise diagnoses are crucial since delays might significantly impact patient health outcomes. This research presents a novel deep-learning architecture tha… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  18. arXiv:2411.06559  [pdf, other

    cs.AI

    Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

    Authors: Yu Gu, Kai Zhang, Yuting Ning, Boyuan Zheng, Boyu Gou, Tianci Xue, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su

    Abstract: Language agents based on large language models (LLMs) have demonstrated great promise in automating web-based tasks. Recent work has shown that incorporating advanced planning algorithms, e.g., tree search, is advantageous over reactive planning for web agents. However, unlike simulated sandbox environments, real-world environments such as the web are rife with irreversible actions. This undermine… ▽ More

    Submitted 1 April, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

    Comments: 22 pages, 11 figures, 6 tables

  19. arXiv:2411.04517  [pdf

    cs.LG cs.AI cs.CV cs.MM

    Continuous Sign Language Recognition System using Deep Learning with MediaPipe Holistic

    Authors: Sharvani Srivastava, Sudhakar Singh, Pooja, Shiv Prakash

    Abstract: Sign languages are the language of hearing-impaired people who use visuals like the hand, facial, and body movements for communication. There are different signs and gestures representing alphabets, words, and phrases. Nowadays approximately 300 sign languages are being practiced worldwide such as American Sign Language (ASL), Chinese Sign Language (CSL), Indian Sign Language (ISL), and many more.… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: 14 pages, 4 figures, Wireless Pers Commun

    Report number: WIRE-D-22-02256

    Journal ref: Wireless Personal Communication, 2024

  20. arXiv:2411.04306  [pdf, other

    cs.IT quant-ph

    List Decodable Quantum LDPC Codes

    Authors: Thiago Bergamaschi, Fernando Granha Jeronimo, Tushant Mittal, Shashank Srivastava, Madhur Tulsiani

    Abstract: We give a construction of Quantum Low-Density Parity Check (QLDPC) codes with near-optimal rate-distance tradeoff and efficient list decoding up to the Johnson bound in polynomial time. Previous constructions of list decodable good distance quantum codes either required access to a classical side channel or were based on algebraic constructions that preclude the LDPC property. Our construction r… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

  21. arXiv:2410.22239  [pdf, other

    cs.CL cs.LG

    DISCERN: Decoding Systematic Errors in Natural Language for Text Classifiers

    Authors: Rakesh R. Menon, Shashank Srivastava

    Abstract: Despite their high predictive accuracies, current machine learning systems often exhibit systematic biases stemming from annotation artifacts or insufficient support for certain classes in the dataset. Recent work proposes automatic methods for identifying and explaining systematic biases using keywords. We introduce DISCERN, a framework for interpreting systematic biases in text classifiers using… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: 20 pages, 9 figures, 15 tables; Accepted to EMNLP 2024

  22. arXiv:2410.19925  [pdf, other

    cs.CL cs.CV cs.LG

    Improving Multimodal Large Language Models Using Continual Learning

    Authors: Shikhar Srivastava, Md Yousuf Harun, Robik Shrestha, Christopher Kanan

    Abstract: Generative large language models (LLMs) exhibit impressive capabilities, which can be further augmented by integrating a pre-trained vision model into the original LLM to create a multimodal LLM (MLLM). However, this integration often significantly decreases performance on natural language understanding and generation tasks, compared to the original LLM. This study investigates this issue using th… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024 Workshop on Scalable Continual Learning for Lifelong Foundation Models

  23. arXiv:2410.19858  [pdf, other

    cs.LG cs.CE eess.SP physics.geo-ph

    Enhancing Deep Learning based RMT Data Inversion using Gaussian Random Field

    Authors: Koustav Ghosal, Arun Singh, Samir Malakar, Shalivahan Srivastava, Deepak Gupta

    Abstract: Deep learning (DL) methods have emerged as a powerful tool for the inversion of geophysical data. When applied to field data, these models often struggle without additional fine-tuning of the network. This is because they are built on the assumption that the statistical patterns in the training and test datasets are the same. To address this, we propose a DL-based inversion scheme for Radio Magnet… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  24. arXiv:2410.18985  [pdf

    eess.SP cs.AI cs.LG

    rECGnition_v1.0: Arrhythmia detection using cardiologist-inspired multi-modal architecture incorporating demographic attributes in ECG

    Authors: Shreya Srivastava, Durgesh Kumar, Jatin Bedi, Sandeep Seth, Deepak Sharma

    Abstract: A substantial amount of variability in ECG manifested due to patient characteristics hinders the adoption of automated analysis algorithms in clinical practice. None of the ECG annotators developed till date consider the characteristics of the patients in a multi-modal architecture. We employed the XGBoost model to analyze the UCI Arrhythmia dataset, linking patient characteristics to ECG morpholo… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  25. arXiv:2410.17481  [pdf, other

    cs.AI

    AI, Global Governance, and Digital Sovereignty

    Authors: Swati Srivastava, Justin Bullock

    Abstract: This essay examines how Artificial Intelligence (AI) systems are becoming more integral to international affairs by affecting how global governors exert power and pursue digital sovereignty. We first introduce a taxonomy of multifaceted AI payoffs for governments and corporations related to instrumental, structural, and discursive power in the domains of violence, markets, and rights. We next leve… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: 21 pages, 2 tables

  26. arXiv:2410.09031  [pdf, other

    cs.IT cs.CC

    Improved List Size for Folded Reed-Solomon Codes

    Authors: Shashank Srivastava

    Abstract: Folded Reed-Solomon (FRS) codes are variants of Reed-Solomon codes, known for their optimal list decoding radius. We show explicit FRS codes with rate $R$ that can be list decoded up to radius $1-R-ε$ with lists of size $\mathcal{O}(1/ ε^2)$. This improves the best known list size among explicit list decoding capacity achieving codes. We also show a more general result that for any $k\geq 1$, th… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: Accepted to SODA 2025

  27. arXiv:2410.08698  [pdf, other

    cs.CL cs.CY

    SocialGaze: Improving the Integration of Human Social Norms in Large Language Models

    Authors: Anvesh Rao Vijjini, Rakesh R. Menon, Jiayi Fu, Shashank Srivastava, Snigdha Chaturvedi

    Abstract: While much research has explored enhancing the reasoning capabilities of large language models (LLMs) in the last few years, there is a gap in understanding the alignment of these models with social values and norms. We introduce the task of judging social acceptance. Social acceptance requires models to judge and rationalize the acceptability of people's actions in social situations. For example,… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  28. arXiv:2410.08437  [pdf, other

    cs.AI cs.CL

    Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks

    Authors: Rushang Karia, Daniel Bramblett, Daksh Dobhal, Siddharth Srivastava

    Abstract: This paper presents AutoEval, a novel benchmark for scaling Large Language Model (LLM) assessment in formal tasks with clear notions of correctness, such as truth maintenance in translation and logical reasoning. AutoEval is the first benchmarking paradigm that offers several key advantages necessary for scaling objective evaluation of LLMs without human labeling: (a) ability to evaluate LLMs of i… ▽ More

    Submitted 11 April, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

  29. arXiv:2410.07166  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

    Authors: Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu

    Abstract: We aim to evaluate Large Language Models (LLMs) for embodied decision making. While a significant body of work has been leveraging LLMs for decision making in embodied environments, we still lack a systematic understanding of their performance because they are usually applied in different domains, for different purposes, and built based on different inputs and outputs. Furthermore, existing evalua… ▽ More

    Submitted 19 January, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted for oral presentation at NeurIPS 2024 in the Datasets and Benchmarks track. Final Camera version

  30. arXiv:2409.12002  [pdf, other

    cs.RO cs.CV

    Towards Global Localization using Multi-Modal Object-Instance Re-Identification

    Authors: Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna

    Abstract: Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance… ▽ More

    Submitted 1 May, 2025; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: 8 pages, 5 figures, 3 tables. Accepted at Advances in Robotics, AIR 2025 (Oral)

    MSC Class: 68T40 ACM Class: I.2.9; I.2.10

  31. arXiv:2409.08605  [pdf, other

    eess.AS cs.SD

    Effective Integration of KAN for Keyword Spotting

    Authors: Anfeng Xu, Biqiao Zhang, Shuyu Kong, Yiteng Huang, Zhaojun Yang, Sangeeta Srivastava, Ming Sun

    Abstract: Keyword spotting (KWS) is an important speech processing component for smart devices with voice assistance capability. In this paper, we investigate if Kolmogorov-Arnold Networks (KAN) can be used to enhance the performance of KWS. We explore various approaches to integrate KAN for a model architecture based on 1D Convolutional Neural Networks (CNN). We find that KAN is effective at modeling high-… ▽ More

    Submitted 11 January, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted to ICASSP 2025

  32. arXiv:2408.14652  [pdf, other

    cs.IT cs.DS

    Continuous Optimization for Decoding Errors

    Authors: Shashank Srivastava

    Abstract: Error-correcting codes are one of the most fundamental objects in pseudorandomness, with applications in communication, complexity theory, and beyond. Codes are useful because of their ability to support decoding, which is the task of recovering a codeword from its noisy copy. List decoding is a relaxation where the decoder is allowed to output a list of codewords, and has seen tremendous progress… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: PhD Thesis, 235 pages

  33. arXiv:2408.05334  [pdf, other

    cs.AI cs.CL cs.CV

    Revisiting Multi-Modal LLM Evaluation

    Authors: Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal Kafle, Christopher Kanan

    Abstract: With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence. However, the most popular datasets used to evaluate MLLMs are some of the earliest ones created, and they have many known problems, including extreme bias, spurious correlations, and an inability to permit fine-grained analys… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  34. arXiv:2405.20772  [pdf

    cs.LG cs.CY

    Reinforcement Learning for Sociohydrology

    Authors: Tirthankar Roy, Shivendra Srivastava, Beichen Zhang

    Abstract: In this study, we discuss how reinforcement learning (RL) provides an effective and efficient framework for solving sociohydrology problems. The efficacy of RL for these types of problems is evident because of its ability to update policies in an iterative manner - something that is also foundational to sociohydrology, where we are interested in representing the co-evolution of human-water interac… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  35. arXiv:2405.15907  [pdf, other

    cs.AI

    Belief-State Query Policies for User-Aligned POMDPs

    Authors: Daniel Bramblett, Siddharth Srivastava

    Abstract: Planning in real-world settings often entails addressing partial observability while aligning with users' requirements. We present a novel framework for expressing users' constraints and preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) policies in the setting of goal-oriented partially observable Markov decision processes (gPOMDPs). We… ▽ More

    Submitted 15 April, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

  36. arXiv:2405.13004  [pdf, other

    cs.CL cs.AI

    MathDivide: Improved mathematical reasoning by large language models

    Authors: Saksham Sahai Srivastava, Ashutosh Gandhi

    Abstract: Large language models have been proven to be capable of handling complex linguistic and cognitive tasks. Therefore their usage has been extended to tasks requiring logical reasoning ability such as Mathematics. In this paper, we propose a prompting technique called MathDivide that breaks down the mathematical problem into simpler subproblems. Each of the subproblems is formulated as an algebraic e… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures

  37. arXiv:2405.09546  [pdf, other

    cs.CV

    BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

    Authors: Yunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu

    Abstract: The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and renderin… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: CVPR 2024 (Highlight). Project website: https://behavior-vision-suite.github.io/

  38. arXiv:2404.19095  [pdf

    cs.HC cs.IR cs.LG cs.SI

    Catalyzing Social Interactions in Mixed Reality using ML Recommendation Systems

    Authors: Sparsh Srivastava, Rohan Arora

    Abstract: We create an innovative mixed reality-first social recommendation model, utilizing features uniquely collected through mixed reality (MR) systems to promote social interaction, such as gaze recognition, proximity, noise level, congestion level, and conversational intensity. We further extend these models to include right-time features to deliver timely notifications. We measure performance metrics… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  39. arXiv:2404.15325  [pdf

    cs.HC

    Quantifying Social Presence in Mixed Reality: A Contemporary Review of Techniques and Innovations

    Authors: Sparsh Srivastava

    Abstract: This literature review investigates the transformative potential of mixed reality (MR) technology, where we explore the intersection of contemporary technological advancements, modern deep learning recommendation systems, and social psychology frameworks. This interdisciplinary study informs the understanding of MR's role in improving social presence, catalyzing novel social interactions, and enha… ▽ More

    Submitted 26 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  40. arXiv:2404.12415  [pdf

    eess.IV cs.CV cs.LG

    Prediction of soil fertility parameters using USB-microscope imagery and portable X-ray fluorescence spectrometry

    Authors: Shubhadip Dasgupta, Satwik Pate, Divya Rathore, L. G. Divyanth, Ayan Das, Anshuman Nayak, Subhadip Dey, Asim Biswas, David C. Weindorf, Bin Li, Sergio Henrique Godinho Silva, Bruno Teixeira Ribeiro, Sanjay Srivastava, Somsubhra Chakraborty

    Abstract: This study investigated the use of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis for rapid soil fertility assessment, with a focus on key indicators such as available boron (B), organic carbon (OC), available manganese (Mn), available sulfur (S), and the sulfur availability index (SAI). A total of 1,133 soil samples from diverse agro-climatic zones in Eastern India were a… ▽ More

    Submitted 5 September, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Published in 'Soil Advances'

    Journal ref: Soil Advances, Volume 2, 2024, 100016

  41. arXiv:2404.00846  [pdf, other

    cs.CV cs.LG

    Transfer Learning with Point Transformers

    Authors: Kartik Gupta, Rahul Vippala, Sahima Srivastava

    Abstract: Point Transformers are near state-of-the-art models for classification, segmentation, and detection tasks on Point Cloud data. They utilize a self attention based mechanism to model large range spatial dependencies between multiple point sets. In this project we explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  42. arXiv:2404.00808  [pdf, other

    cs.RO

    Using Explainable AI and Hierarchical Planning for Outreach with Robots

    Authors: Rushang Karia, Jayesh Nagpal, Daksh Dobhal, Pulkit Verma, Rashmeet Kaur Nayyar, Naman Shah, Siddharth Srivastava

    Abstract: Understanding how robots plan and execute tasks is crucial in today's world, where they are becoming more prevalent in our daily lives. However, teaching non-experts, such as K-12 students, the complexities of robot planning can be challenging. This work presents an open-source platform, \nameAbbr{}, that simplifies the process using a visual interface that abstracts the details of various plannin… ▽ More

    Submitted 11 November, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  43. arXiv:2403.18327  [pdf, other

    cs.CL cs.AI

    $\forall$uto$\exists$val: Autonomous Assessment of LLMs in Formal Synthesis and Interpretation Tasks

    Authors: Rushang Karia, Daniel Bramblett, Daksh Dobhal, Pulkit Verma, Siddharth Srivastava

    Abstract: This paper presents $\forall$uto$\exists$val, a new approach for scaling LLM assessment in translating formal syntax -- such as first-order logic, regular expressions, etc -- to natural language (interpretation) or vice versa (compilation), thereby facilitating their use in applications such as generating/explaining logic and control flow for programs etc. Existing approaches for LLM assessment in… ▽ More

    Submitted 21 July, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  44. arXiv:2403.09227  [pdf, other

    cs.RO cs.AI

    BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

    Authors: Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews , et al. (10 additional authors not shown)

    Abstract: We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: A preliminary version was published at 6th Conference on Robot Learning (CoRL 2022)

  45. arXiv:2403.05022  [pdf, other

    cs.SE

    Effective Fault Localization using Probabilistic and Grouping Approach

    Authors: Saksham Sahai Srivastava, Arpita Dutta, Rajib Mall

    Abstract: Context: Fault localization (FL) is the key activity while debugging a program. Any improvement to this activity leads to significant improvement in total software development cost. There is an internal linkage between the program spectrum and test execution result. Conditional probability in statistics captures the probability of occurring one event in relationship to one or more other events. Ob… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  46. arXiv:2403.03360  [pdf, other

    cs.CR

    Bridge the Future: High-Performance Networks in Confidential VMs without Trusted I/O devices

    Authors: Mengyuan Li, Shashvat Srivastava, Mengjia Yan

    Abstract: Trusted I/O (TIO) is an appealing solution to improve I/O performance for confidential VMs (CVMs), with the potential to eliminate broad sources of I/O overhead. However, this paper emphasizes that not all types of I/O can derive substantial benefits from TIO, particularly network I/O. Given the obligatory use of encryption protocols for network traffic in CVM's threat model, TIO's approach of I/O… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  47. arXiv:2402.19450  [pdf, other

    cs.AI cs.CL

    Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

    Authors: Saurabh Srivastava, Annarose M B, Anto P V, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas

    Abstract: We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 37 pages, 10 figures

  48. arXiv:2402.11871  [pdf, other

    cs.RO cs.AI

    From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions, and Models for Planning from Raw Data

    Authors: Naman Shah, Jayesh Nagpal, Pulkit Verma, Siddharth Srivastava

    Abstract: Hand-crafted, logic-based state and action representations have been widely used to overcome the intractable computational complexity of long-horizon robot planning problems, including task and motion planning problems. However, creating such representations requires experts with strong intuitions and detailed knowledge about the robot and the tasks it may need to accomplish in a given setting. Re… ▽ More

    Submitted 4 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  49. Asynchronous Distributed Coordinated Hybrid Precoding in Multi-cell mmWave Wireless Networks

    Authors: Meesam Jafri, Suraj Srivastava, Sunil Kumar, Aditya K. Jagannatham, Lajos Hanzo

    Abstract: Asynchronous distributed hybrid beamformers (ADBF) are conceived for minimizing the total transmit power subject to signal-to-interference-plus-noise ratio (SINR) constraints at the users. Our design requires only limited information exchange between the base stations (BSs) of the mmWave multi-cell coordinated (MCC) networks considered. To begin with, a semidefinite relaxation (SDR)-based fully-di… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Journal ref: IEEE Open Journal of Vehicular Technology, vol. 5, pp. 200-218, 2024

  50. Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings

    Authors: Rushang Karia, Pulkit Verma, Alberto Speranzon, Siddharth Srivastava

    Abstract: This paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed fra… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: To appear at ICAPS-24

    Journal ref: Proceedings of the International Conference on Automated Planning and Scheduling, 34(1), 310-318, 2024