Skip to main content

Showing 1–50 of 160 results for author: singh, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16507  [pdf, ps, other

    cs.LG

    Robust Reward Modeling via Causal Rubrics

    Authors: Pragya Srivastava, Harman Singh, Rahul Madhavan, Gandharv Patil, Sravanti Addepalli, Arun Suggala, Rengarajan Aravamudhan, Soumya Sharma, Anirban Laha, Aravindan Raghuveer, Karthikeyan Shanmugam, Doina Precup

    Abstract: Reward models (RMs) are fundamental to aligning Large Language Models (LLMs) via human feedback, yet they often suffer from reward hacking. They tend to latch on to superficial or spurious attributes, such as response length or formatting, mistaking these cues learned from correlations in training data for the true causal drivers of quality (e.g., factuality, relevance). This occurs because standa… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  2. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  3. arXiv:2506.06476  [pdf, other

    cs.RO

    Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception

    Authors: Pushyami Kaveti, Ambjorn Grimsrud Waldum, Hanumant Singh, Martin Ludvigsen

    Abstract: Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) demand robust spatial perception capabilities, including Simultaneous Localization and Mapping (SLAM), to support both remote and autonomous tasks. Vision-based systems have been integral to these advancements, capturing rich color and texture at low cost while enabling semantic scene understanding. However, underwater con… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  4. arXiv:2506.00756  [pdf, ps, other

    cs.LG cs.AI stat.ML

    "Who experiences large model decay and why?" A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift

    Authors: Harvineet Singh, Fan Xia, Alexej Gossmann, Andrew Chuang, Julian C. Hong, Jean Feng

    Abstract: Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimiz… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 13 pages, 9 figures, 8 tables, 18 pages appendix. To be published in Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025

  5. arXiv:2505.16310  [pdf, ps, other

    cs.CV eess.IV

    Paired and Unpaired Image to Image Translation using Generative Adversarial Networks

    Authors: Gaurav Kumar, Soham Satyadharma, Harpreet Singh

    Abstract: Image to image translation is an active area of research in the field of computer vision, enabling the generation of new images with different styles, textures, or resolutions while preserving their characteristic properties. Recent architectures leverage Generative Adversarial Networks (GANs) to transform input images from one domain to another. In this work, we focus on the study of both paired… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 6 pages

  6. arXiv:2505.09251  [pdf

    cs.CV

    A Surrogate Model for the Forward Design of Multi-layered Metasurface-based Radar Absorbing Structures

    Authors: Vineetha Joy, Aditya Anand, Nidhi, Anshuman Kumar, Amit Sethi, Hema Singh

    Abstract: Metasurface-based radar absorbing structures (RAS) are highly preferred for applications like stealth technology, electromagnetic (EM) shielding, etc. due to their capability to achieve frequency selective absorption characteristics with minimal thickness and reduced weight penalty. However, the conventional approach for the EM design and optimization of these structures relies on forward simulati… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  7. arXiv:2504.21194  [pdf, other

    cs.CV cs.AI

    Geolocating Earth Imagery from ISS: Integrating Machine Learning with Astronaut Photography for Enhanced Geographic Mapping

    Authors: Vedika Srivastava, Hemant Kumar Singh, Jaisal Singh

    Abstract: This paper presents a novel approach to geolocating images captured from the International Space Station (ISS) using advanced machine learning algorithms. Despite having precise ISS coordinates, the specific Earth locations depicted in astronaut-taken photographs often remain unidentified. Our research addresses this gap by employing three distinct image processing pipelines: a Neural Network base… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  8. arXiv:2504.13263  [pdf, other

    cs.AI

    Causal-Copilot: An Autonomous Causal Analysis Agent

    Authors: Xinyue Wang, Kun Zhou, Wenyi Wu, Har Simrat Singh, Fang Nan, Songyao Jin, Aryan Philip, Saloni Patnaik, Hou Zhu, Shivam Singh, Parjanya Prashant, Qian Shen, Biwei Huang

    Abstract: Causal analysis plays a foundational role in scientific discovery and reliable decision-making, yet it remains largely inaccessible to domain experts due to its conceptual and algorithmic complexity. This disconnect between causal methodology and practical usability presents a dual challenge: domain experts are unable to leverage recent advances in causal learning, while causal researchers lack br… ▽ More

    Submitted 21 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  9. Grade Guard: A Smart System for Short Answer Automated Grading

    Authors: Niharika Dadu, Harsh Vardhan Singh, Romi Banerjee

    Abstract: The advent of large language models (LLMs) in the education sector has provided impetus to automate grading short answer questions. LLMs make evaluating short answers very efficient, thus addressing issues like staff shortage. However, in the task of Automated Short Answer Grading (ASAG), LLM responses are influenced by diverse perspectives in their training dataset, leading to inaccuracies in eva… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 11 pages, 18 figures

    ACM Class: I.2.7

  10. arXiv:2503.19786  [pdf, other

    cs.CL cs.AI

    Gemma 3 Technical Report

    Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

    Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  11. arXiv:2503.11851  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    DCAT: Dual Cross-Attention Fusion for Disease Classification in Radiological Images with Uncertainty Estimation

    Authors: Jutika Borah, Hidam Kumarjit Singh

    Abstract: Accurate and reliable image classification is crucial in radiology, where diagnostic decisions significantly impact patient outcomes. Conventional deep learning models tend to produce overconfident predictions despite underlying uncertainties, potentially leading to misdiagnoses. Attention mechanisms have emerged as powerful tools in deep learning, enabling models to focus on relevant parts of the… ▽ More

    Submitted 19 March, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: 18 pages, 8 figures, 5 tables

  12. arXiv:2503.05394  [pdf, other

    cs.SE cs.AI

    Static Program Analysis Guided LLM Based Unit Test Generation

    Authors: Sujoy Roychowdhury, Giriprasad Sridhara, A K Raghavan, Joy Bose, Sourav Mazumdar, Hamender Singh, Srinivasan Bajji Sugumaran, Ricardo Britto

    Abstract: We describe a novel approach to automating unit test generation for Java methods using large language models (LLMs). Existing LLM-based approaches rely on sample usage(s) of the method to test (focal method) and/or provide the entire class of the focal method as input prompt and context. The former approach is often not viable due to the lack of sample usages, especially for newly written focal me… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  13. arXiv:2502.03965  [pdf

    cs.LG

    Innovative Framework for Early Estimation of Mental Disorder Scores to Enable Timely Interventions

    Authors: Himanshi Singh, Sadhana Tiwari, Sonali Agarwal, Ritesh Chandra, Sanjay Kumar Sonbhadra, Vrijendra Singh

    Abstract: Individual's general well-being is greatly impacted by mental health conditions including depression and Post-Traumatic Stress Disorder (PTSD), underscoring the importance of early detection and precise diagnosis in order to facilitate prompt clinical intervention. An advanced multimodal deep learning system for the automated classification of PTSD and depression is presented in this paper. Utiliz… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  14. arXiv:2502.03943  [pdf

    cs.LG

    Multimodal Data-Driven Classification of Mental Disorders: A Comprehensive Approach to Diagnosing Depression, Anxiety, and Schizophrenia

    Authors: Himanshi Singh, Sadhana Tiwari, Sonali Agarwal, Ritesh Chandra, Sanjay Kumar Sonbhadra, Vrijendra Singh

    Abstract: This study investigates the potential of multimodal data integration, which combines electroencephalogram (EEG) data with sociodemographic characteristics like age, sex, education, and intelligence quotient (IQ), to diagnose mental diseases like schizophrenia, depression, and anxiety. Using Apache Spark and convolutional neural networks (CNNs), a data-driven classification pipeline has been develo… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  15. arXiv:2501.16204  [pdf, other

    eess.SP cs.ET physics.app-ph

    Convolutions with Radio-Frequency Spin-Diodes

    Authors: Erwann Plouet, Hanuman Singh, Pankaj Sethi, Frank A. Mizrahi, Dedalo Sanz-Hernandez, Julie Grollier

    Abstract: The classification of radio-frequency (RF) signals is crucial for applications in robotics, traffic control, and medical devices. Spintronic devices, which respond to RF signals via ferromagnetic resonance, offer a promising solution. Recent studies have shown that a neural network of nanoscale magnetic tunnel junctions can classify RF signals without digitization. However, the complexity of these… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  16. arXiv:2501.06208  [pdf, other

    cs.CL

    Enhancing AI Safety Through the Fusion of Low Rank Adapters

    Authors: Satya Swaroop Gudipudi, Sreeram Vipparla, Harpreet Singh, Shashwat Goel, Ponnurangam Kumaraguru

    Abstract: Instruction fine-tuning of large language models (LLMs) is a powerful method for improving task-specific performance, but it can inadvertently lead to a phenomenon where models generate harmful responses when faced with malicious prompts. In this paper, we explore Low-Rank Adapter Fusion (LoRA) as a means to mitigate these risks while preserving the model's ability to handle diverse instructions e… ▽ More

    Submitted 30 December, 2024; originally announced January 2025.

  17. arXiv:2412.08048  [pdf, other

    cs.CV cs.LG

    Surveying Facial Recognition Models for Diverse Indian Demographics: A Comparative Analysis on LFW and Custom Dataset

    Authors: Pranav Pant, Niharika Dadu, Harsh V. Singh, Anshul Thakur

    Abstract: Facial recognition technology has made significant advances, yet its effectiveness across diverse ethnic backgrounds, particularly in specific Indian demographics, is less explored. This paper presents a detailed evaluation of both traditional and deep learning-based facial recognition models using the established LFW dataset and our newly developed IITJ Faces of Academia Dataset (JFAD), which com… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: Research Project - Computer Vision

  18. arXiv:2412.06089  [pdf, other

    cs.CV

    GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis

    Authors: Ashish Goswami, Satyam Kumar Modi, Santhosh Rishi Deshineni, Harman Singh, Prathosh A. P, Parag Singla

    Abstract: Text-to-image (T2I) generation has seen significant progress with diffusion models, enabling generation of photo-realistic images from text prompts. Despite this progress, existing methods still face challenges in following complex text prompts, especially those requiring compositional and multi-step reasoning. Given such complex instructions, SOTA models often make mistakes in faithfully modeling… ▽ More

    Submitted 11 March, 2025; v1 submitted 8 December, 2024; originally announced December 2024.

  19. arXiv:2411.17636  [pdf, other

    cs.RO cs.AI

    MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

    Authors: Harsh Singh, Rocktim Jyoti Das, Mingfei Han, Preslav Nakov, Ivan Laptev

    Abstract: Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans i… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 48 pages

  20. arXiv:2411.16508  [pdf, other

    cs.CV cs.CL

    All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

    Authors: Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Mihaylov, Chao Qin, Abdelrahman M Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani , et al. (44 additional authors not shown)

    Abstract: Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All La… ▽ More

    Submitted 30 April, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: A Multilingual Multimodal cultural benchmark for 100 languages

  21. arXiv:2411.05755  [pdf, other

    cs.RO cs.CL cs.CV

    End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering

    Authors: Dylan Goetting, Himanshu Gaurav Singh, Antonio Loquercio

    Abstract: We present VLMnav, an embodied framework to transform a Vision-Language Model (VLM) into an end-to-end navigation policy. In contrast to prior work, we do not rely on a separation between perception, planning, and control; instead, we use a VLM to directly select actions in one step. Surprisingly, we find that a VLM can be used as an end-to-end policy zero-shot, i.e., without any fine-tuning or ex… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  22. arXiv:2410.11923  [pdf, other

    cs.LG cs.NE

    Spatial-Temporal Bearing Fault Detection Using Graph Attention Networks and LSTM

    Authors: Moirangthem Tiken Singh, Rabinder Kumar Prasad, Gurumayum Robert Michael, N. Hemarjit Singh, N. K. Kaphungkui

    Abstract: Purpose: This paper aims to enhance bearing fault diagnosis in industrial machinery by introducing a novel method that combines Graph Attention Network (GAT) and Long Short-Term Memory (LSTM) networks. This approach captures both spatial and temporal dependencies within sensor data, improving the accuracy of bearing fault detection under various conditions. Methodology: The proposed method convert… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    ACM Class: I.2; J.6

  23. arXiv:2410.08121  [pdf, other

    cs.LG cs.AI

    Heterogeneous Graph Auto-Encoder for CreditCard Fraud Detection

    Authors: Moirangthem Tiken Singh, Rabinder Kumar Prasad, Gurumayum Robert Michael, N K Kaphungkui, N. Hemarjit Singh

    Abstract: The digital revolution has significantly impacted financial transactions, leading to a notable increase in credit card usage. However, this convenience comes with a trade-off: a substantial rise in fraudulent activities. Traditional machine learning methods for fraud detection often struggle to capture the inherent interconnectedness within financial data. This paper proposes a novel approach for… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  24. arXiv:2410.07920  [pdf, other

    cs.HC cs.ET

    Post-Training Quantization in Brain-Computer Interfaces based on Event-Related Potential Detection

    Authors: Hubert Cecotti, Dalvir Dhaliwal, Hardip Singh, Yogesh Kumar Meena

    Abstract: Post-training quantization (PTQ) is a technique used to optimize and reduce the memory footprint and computational requirements of machine learning models. It has been used primarily for neural networks. For Brain-Computer Interfaces (BCI) that are fully portable and usable in various situations, it is necessary to provide approaches that are lightweight for storage and computation. In this paper,… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  25. arXiv:2410.03972  [pdf, other

    cs.LG cs.IT cs.NE q-bio.NC

    Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks

    Authors: Ann Huang, Satpreet H. Singh, Flavio Martinelli, Kanaka Rajan

    Abstract: Task-trained recurrent neural networks (RNNs) are widely used in neuroscience and machine learning to model dynamical computations. To gain mechanistic insight into how neural systems solve tasks, prior work often reverse-engineers individual trained networks. However, different RNNs trained on the same task and achieving similar performance can exhibit strikingly different internal solutions-a ph… ▽ More

    Submitted 28 May, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  26. arXiv:2409.08273  [pdf, other

    cs.RO cs.AI cs.CV

    Hand-Object Interaction Pretraining from Videos

    Authors: Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik

    Abstract: We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic ba… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  27. arXiv:2409.03328  [pdf, other

    cs.NE

    Pareto Set Prediction Assisted Bilevel Multi-objective Optimization

    Authors: Bing Wang, Hemant K. Singh, Tapabrata Ray

    Abstract: Bilevel optimization problems comprise an upper level optimization task that contains a lower level optimization task as a constraint. While there is a significant and growing literature devoted to solving bilevel problems with single objective at both levels using evolutionary computation, there is relatively scarce work done to address problems with multiple objectives (BLMOP) at both levels. Fo… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  28. arXiv:2409.02384  [pdf, other

    cs.CL cs.SD eess.AS

    STAB: Speech Tokenizer Assessment Benchmark

    Authors: Shikhar Vashishth, Harman Singh, Shikhar Bharadwaj, Sriram Ganapathy, Chulayuth Asawaroengchai, Kartik Audhkhasi, Andrew Rosenberg, Ankur Bapna, Bhuvana Ramabhadran

    Abstract: Representing speech as discrete tokens provides a framework for transforming speech into a format that closely resembles text, thus enabling the use of speech as an input to the widely successful large language models (LLMs). Currently, while several speech tokenizers have been proposed, there is ambiguity regarding the properties that are desired from a tokenizer for specific downstream tasks and… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 5 pages

  29. arXiv:2408.17011  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Disease Classification and Impact of Pretrained Deep Convolution Neural Networks on Diverse Medical Imaging Datasets across Imaging Modalities

    Authors: Jutika Borah, Kumaresh Sarmah, Hidam Kumarjit Singh

    Abstract: Imaging techniques such as Chest X-rays, whole slide images, and optical coherence tomography serve as the initial screening and detection for a wide variety of medical pulmonary and ophthalmic conditions respectively. This paper investigates the intricacies of using pretrained deep convolutional neural networks with transfer learning across diverse medical imaging datasets with varying modalities… ▽ More

    Submitted 2 September, 2024; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: 15 pages, 3 figures, 4 tables

  30. arXiv:2408.15714  [pdf, other

    cs.CV cs.LG

    Pixels to Prose: Understanding the art of Image Captioning

    Authors: Hrishikesh Singh, Aarti Sharma, Millie Pant

    Abstract: In the era of evolving artificial intelligence, machines are increasingly emulating human-like capabilities, including visual perception and linguistic expression. Image captioning stands at the intersection of these domains, enabling machines to interpret visual content and generate descriptive text. This paper provides a thorough review of image captioning techniques, catering to individuals ent… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  31. arXiv:2408.10161  [pdf, other

    cs.CV cs.AI cs.RO

    NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices

    Authors: Zhiyong Zhang, Aniket Gupta, Huaizu Jiang, Hanumant Singh

    Abstract: Real-time high-accuracy optical flow estimation is crucial for various real-world applications. While recent learning-based optical flow methods have achieved high accuracy, they often come with significant computational costs. In this paper, we propose a highly efficient optical flow method that balances high accuracy with reduced computational demands. Building upon NeuFlow v1, we introduce new… ▽ More

    Submitted 21 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  32. arXiv:2408.04490  [pdf, ps, other

    cs.CR math.GR

    Symmetric Encryption Scheme Based on Quasigroup Using Chained Mode of Operation

    Authors: Satish Kumar, Harshdeep Singh, Indivar Gupta, Ashok Ji Gupta

    Abstract: In this paper, we propose a novel construction for a symmetric encryption scheme, referred as SEBQ which is based on the structure of quasigroup. We utilize concepts of chaining like mode of operation and present a block cipher with in-built properties. We prove that SEBQ shows resistance against chosen plaintext attack (CPA) and by applying unbalanced Feistel transformation [19], it achieves secu… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    MSC Class: 20N05; 05B15; 94A60; 68W20

  33. arXiv:2407.10275  [pdf, other

    cs.CL cs.AI

    Cross-Lingual Multi-Hop Knowledge Editing

    Authors: Aditi Khandelwal, Harman Singh, Hengrui Gu, Tianlong Chen, Kaixiong Zhou

    Abstract: Large language models are often expected to constantly adapt to new sources of knowledge and knowledge editing techniques aim to efficiently patch the outdated model knowledge, with minimal modification. Most prior works focus on monolingual knowledge editing in English, even though new information can emerge in any language from any part of the world. We propose the Cross-Lingual Multi-Hop Knowle… ▽ More

    Submitted 15 February, 2025; v1 submitted 14 July, 2024; originally announced July 2024.

  34. arXiv:2407.04708  [pdf, other

    cs.CV cs.LG

    QMViT: A Mushroom is worth 16x16 Words

    Authors: Siddhant Dutta, Hemant Singh, Kalpita Shankhdhar, Sridhar Iyer

    Abstract: Consuming poisonous mushrooms can have severe health consequences, even resulting in fatality and accurately distinguishing edible from toxic mushroom varieties remains a significant challenge in ensuring food safety. So, it's crucial to distinguish between edible and poisonous mushrooms within the existing species. This is essential due to the significant demand for mushrooms in people's daily me… ▽ More

    Submitted 10 May, 2024; originally announced July 2024.

  35. arXiv:2407.03454  [pdf, other

    cs.NE math.OC

    Decomposition of Difficulties in Complex Optimization Problems Using a Bilevel Approach

    Authors: Ankur Sinha, Dhaval Pujara, Hemant Kumar Singh

    Abstract: Practical optimization problems may contain different kinds of difficulties that are often not tractable if one relies on a particular optimization method. Different optimization approaches offer different strengths that are good at tackling one or more difficulty in an optimization problem. For instance, evolutionary algorithms have a niche in handling complexities like discontinuity, non-differe… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 9 pages

    MSC Class: 90C30 ACM Class: G.0

  36. arXiv:2406.19668  [pdf, other

    cs.CV

    PopAlign: Population-Level Alignment for Fair Text-to-Image Generation

    Authors: Shufan Li, Harkanwar Singh, Aditya Grover

    Abstract: Text-to-image (T2I) models achieve high-fidelity generation through extensive training on large datasets. However, these models may unintentionally pick up undesirable biases of their training data, such as over-representation of particular identities in gender or ethnicity neutral prompts. Existing alignment methods such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference O… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 18 pages, 10 figures

  37. arXiv:2406.02554  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.LG cs.MM

    Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

    Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

    Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More

    Submitted 22 March, 2024; originally announced June 2024.

  38. arXiv:2405.18948  [pdf, other

    cs.RO cs.LG

    Learning to Recover from Plan Execution Errors during Robot Manipulation: A Neuro-symbolic Approach

    Authors: Namasivayam Kalithasan, Arnav Tuli, Vishal Bindal, Himanshu Gaurav Singh, Parag Singla, Rohan Paul

    Abstract: Automatically detecting and recovering from failures is an important but challenging problem for autonomous robots. Most of the recent work on learning to plan from demonstrations lacks the ability to detect and recover from errors in the absence of an explicit state representation and/or a (sub-) goal check function. We propose an approach (blending learning with symbolic search) for automated er… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  39. arXiv:2404.16816  [pdf, other

    cs.CL

    IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

    Authors: Harman Singh, Nitish Gupta, Shikhar Bharadwaj, Dinesh Tewari, Partha Talukdar

    Abstract: As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse… ▽ More

    Submitted 7 August, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: ACL 2024

  40. arXiv:2404.15549  [pdf, other

    cs.CL cs.AI

    PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models

    Authors: Shashi Kant Gupta, Aditya Basu, Mauro Nievas, Jerrin Thomas, Nathan Wolfrath, Adhitya Ramamurthi, Bradley Taylor, Anai N. Kothari, Regina Schwind, Therica M. Miller, Sorena Nadaf-Rahrov, Yanshan Wang, Hrituraj Singh

    Abstract: Clinical trial matching is the task of identifying trials for which patients may be potentially eligible. Typically, this task is labor-intensive and requires detailed verification of patient electronic health records (EHRs) against the stringent inclusion and exclusion criteria of clinical trials. This process is manual, time-intensive, and challenging to scale up, resulting in many patients miss… ▽ More

    Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 30 Pages, 8 Figures, Supplementary Work Attached

  41. arXiv:2404.08085  [pdf, ps, other

    cs.DS

    Matrix Multiplication Reductions

    Authors: Ashish Gola, Igor Shinkar, Harsimran Singh

    Abstract: In this paper we study a worst case to average case reduction for the problem of matrix multiplication over finite fields. Suppose we have an efficient average case algorithm, that given two random matrices $A,B$ outputs a matrix that has a non-trivial correlation with their product $A \cdot B$. Can we transform it into a worst case algorithm, that outputs the correct answer for all inputs without… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  42. arXiv:2404.07774  [pdf, ps, other

    cs.LG cs.RO

    Sketch-Plan-Generalize: Learning and Planning with Neuro-Symbolic Programmatic Representations for Inductive Spatial Concepts

    Authors: Namasivayam Kalithasan, Sachit Sachdeva, Himanshu Gaurav Singh, Vishal Bindal, Arnav Tuli, Gurarmaan Singh Panjeta, Harsh Himanshu Vora, Divyanshu Aggarwal, Rohan Paul, Parag Singla

    Abstract: Effective human-robot collaboration requires the ability to learn personalized concepts from a limited number of demonstrations, while exhibiting inductive generalization, hierarchical composition, and adaptability to novel constraints. Existing approaches that use code generation capabilities of pre-trained large (vision) language models as well as purely neural models show poor generalization to… ▽ More

    Submitted 17 June, 2025; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Programmatic Representations for Agent Learning Worskop, ICML 2025

  43. arXiv:2404.06680  [pdf, other

    cs.CL

    Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology

    Authors: Shashi Kant Gupta, Aditya Basu, Bradley Taylor, Anai Kothari, Hrituraj Singh

    Abstract: Retrieving information from EHR systems is essential for answering specific questions about patient journeys and improving the delivery of clinical care. Despite this fact, most EHR systems still rely on keyword-based searches. With the advent of generative large language models (LLMs), retrieving information can lead to better search and summarization capabilities. Such retrievers can also feed R… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 18 pages

  44. arXiv:2404.04714  [pdf, other

    cs.LG cs.AI cs.CR

    Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

    Authors: Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju

    Abstract: Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to m… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at UAI 2022

  45. arXiv:2403.19885  [pdf, other

    cs.CV cs.RO

    Towards Long Term SLAM on Thermal Imagery

    Authors: Colin Keil, Aniket Gupta, Pushyami Kaveti, Hanumant Singh

    Abstract: Visual SLAM with thermal imagery, and other low contrast visually degraded environments such as underwater, or in areas dominated by snow and ice, remain a difficult problem for many state of the art (SOTA) algorithms. In addition to challenging front-end data association, thermal imagery presents an additional difficulty for long term relocalization and map reuse. The relative temperatures of obj… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, Submitted to IROS 2024

  46. arXiv:2403.13170  [pdf, other

    cs.RO

    On Designing Consistent Covariance Recovery from a Deep Learning Visual Odometry Engine

    Authors: Jagatpreet Singh Nir, Dennis Giaya, Hanumant Singh

    Abstract: Deep learning techniques have significantly advanced in providing accurate visual odometry solutions by leveraging large datasets. However, generating uncertainty estimates for these methods remains a challenge. Traditional sensor fusion approaches in a Bayesian framework are well-established, but deep learning techniques with millions of parameters lack efficient methods for uncertainty estimatio… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Submitted to IROS 2024

  47. arXiv:2403.10425  [pdf, other

    cs.CV cs.AI cs.RO

    NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices

    Authors: Zhiyong Zhang, Huaizu Jiang, Hanumant Singh

    Abstract: Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow methods have achieved high accuracy, they often come with heavy computation costs. In this paper, we propose a highly efficient optical flow architecture… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  48. arXiv:2403.01628  [pdf, ps, other

    cs.LG

    Recent Advances, Applications, and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2023 Symposium

    Authors: Hyewon Jeong, Sarah Jabbour, Yuzhe Yang, Rahul Thapta, Hussein Mozannar, William Jongwon Han, Nikita Mehandru, Michael Wornow, Vladislav Lialin, Xin Liu, Alejandro Lozano, Jiacheng Zhu, Rafal Dariusz Kocielnik, Keith Harrigian, Haoran Zhang, Edward Lee, Milos Vukadinovic, Aparna Balagopalan, Vincent Jeanselme, Katherine Matton, Ilker Demirel, Jason Fries, Parisa Rashidi, Brett Beaulieu-Jones, Xuhai Orson Xu , et al. (18 additional authors not shown)

    Abstract: The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four vir… ▽ More

    Submitted 5 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: ML4H 2023, Research Roundtables

  49. arXiv:2402.17412  [pdf, other

    cs.CV

    DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models

    Authors: Shyam Marjit, Harshit Singh, Nityanand Mathur, Sayak Paul, Chia-Mu Yu, Pin-Yu Chen

    Abstract: In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced se… ▽ More

    Submitted 28 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Project Page: https://diffusekrona.github.io/

  50. arXiv:2402.14254  [pdf, other

    cs.LG stat.ML

    A hierarchical decomposition for explaining ML performance discrepancies

    Authors: Jean Feng, Harvineet Singh, Fan Xia, Adarsh Subbaswamy, Alexej Gossmann

    Abstract: Machine learning (ML) algorithms can often differ in performance across domains. Understanding $\textit{why}$ their performance differs is crucial for determining what types of interventions (e.g., algorithmic or operational) are most effective at closing the performance gaps. Existing methods focus on $\textit{aggregate decompositions}$ of the total performance gap into the impact of a shift in t… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 11 pages, 5 figures in main body; 14 pages and 2 figures in appendices