Skip to main content

Showing 1–50 of 527 results for author: Vinay

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.16596  [pdf, ps, other

    cs.AI

    A Community-driven vision for a new Knowledge Resource for AI

    Authors: Vinay K Chaudhri, Chaitan Baru, Brandon Bennett, Mehul Bhatt, Darion Cassel, Anthony G Cohn, Rina Dechter, Esra Erdem, Dave Ferrucci, Ken Forbus, Gregory Gelfond, Michael Genesereth, Andrew S. Gordon, Benjamin Grosof, Gopal Gupta, Jim Hendler, Sharat Israni, Tyler R. Josephson, Patrick Kyllonen, Yuliya Lierler, Vladimir Lifschitz, Clifton McFate, Hande K. McGinty, Leora Morgenstern, Alessandro Oltramari , et al. (7 additional authors not shown)

    Abstract: The long-standing goal of creating a comprehensive, multi-purpose knowledge resource, reminiscent of the 1984 Cyc project, still persists in AI. Despite the success of knowledge resources like WordNet, ConceptNet, Wolfram|Alpha and other commercial knowledge graphs, verifiable, general-purpose widely available sources of knowledge remain a critical deficiency in AI infrastructure. Large language m… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 17 pages

  2. arXiv:2506.16537  [pdf

    cs.RO eess.SY

    Agile, Autonomous Spacecraft Constellations with Disruption Tolerant Networking to Monitor Precipitation and Urban Floods

    Authors: Sreeja Roy-Singh, Alan P. Li, Vinay Ravindra, Roderick Lammers, Marc Sanchez Net

    Abstract: Fully re-orientable small spacecraft are now supported by commercial technologies, allowing them to point their instruments in any direction and capture images, with short notice. When combined with improved onboard processing, and implemented on a constellation of inter-communicable satellites, this intelligent agility can significantly increase responsiveness to transient or evolving phenomena.… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Journal ref: Robotics Science and Systems (RSS 2025) - Space Robotics Workshop

  3. arXiv:2506.12154  [pdf, ps, other

    cs.SD eess.AS

    Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding

    Authors: Haoran Zhou, Xingchen Song, Brendan Fahy, Qiaochu Song, Binbin Zhang, Zhendong Peng, Anshul Wadhawan, Denglin Jiang, Apurv Verma, Vinay Ramesh, Srivas Prasad, Michele M. Franceschini

    Abstract: OpenAI Whisper is a family of robust Automatic Speech Recognition (ASR) models trained on 680,000 hours of audio. However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) structure. We introduce an additional Conn… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025

  4. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, Adrià de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  5. arXiv:2506.09197  [pdf, other

    cs.NI

    Adaptive Bandwidth Sharing for Optimizing QoE of Real-Time Video

    Authors: Sushi Anna George, Vinay Joseph

    Abstract: The concept of spectrum or bandwidth sharing has gained significant global attention as a means to enhance the efficiency of real-time traffic management in wireless networks. Effective bandwidth sharing enables optimal utilization of available resources, reducing congestion and improving QoE for delay-sensitive applications such as real-time video transmission. In this paper, we propose a novel i… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: arXiv admin note: text overlap with arXiv:2401.10681

  6. arXiv:2506.05104  [pdf, other

    cs.SD cs.AI cs.LG

    Survey on the Evaluation of Generative Models in Music

    Authors: Alexander Lerch, Claire Arthur, Nick Bryan-Kinns, Corey Ford, Qianyi Sun, Ashvala Vinay

    Abstract: Research on generative systems in music has seen considerable attention and growth in recent years. A variety of attempts have been made to systematically evaluate such systems. We provide an interdisciplinary review of the common evaluation targets, methodologies, and metrics for the evaluation of both system output and model usability, covering subjective and objective approaches, qualitative an… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Submitted to ACM CSUR, 26-Jun-2024

  7. arXiv:2506.04642  [pdf, ps, other

    cs.CL

    TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering

    Authors: Vinay Joshi, Pratik Prabhanjan Brahma, Zicheng Liu, Emad Barsoum

    Abstract: The key-value (KV) cache in transformer models is a critical component for efficient decoding or inference, yet its memory demands scale poorly with sequence length, posing a major challenge for scalable deployment of large language models. Among several approaches to KV cache compression, quantization of key and value activations has been widely explored. Most KV cache quantization methods still… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: ACL-2025 industry-track accepted

  8. arXiv:2506.04514  [pdf, ps, other

    cs.NI cs.AI

    BEAR: BGP Event Analysis and Reporting

    Authors: Hanqing Li, Melania Fedeli, Vinay Kolar, Diego Klabjan

    Abstract: The Internet comprises of interconnected, independently managed Autonomous Systems (AS) that rely on the Border Gateway Protocol (BGP) for inter-domain routing. BGP anomalies--such as route leaks and hijacks--can divert traffic through unauthorized or inefficient paths, jeopardizing network reliability and security. Although existing rule-based and machine learning methods can detect these anomali… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  9. arXiv:2505.21410  [pdf, ps, other

    cs.AI cs.LG cs.RO

    MRSD: Multi-Resolution Skill Discovery for HRL Agents

    Authors: Shashank Sharma, Janina Hoffmann, Vinay Namboodiri

    Abstract: Hierarchical reinforcement learning (HRL) relies on abstract skills to solve long-horizon tasks efficiently. While existing skill discovery methods learns these skills automatically, they are limited to a single skill per task. In contrast, humans learn and use both fine-grained and coarse motor skills simultaneously. Inspired by human motor control, we propose Multi-Resolution Skill Discovery (MR… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  10. Securing Credit Inquiries: The Role of Real-Time User Approval in Preventing SSN Identity Theft

    Authors: Gogulakrishnan Thiyagarajan, Vinay Bist, Prabhudarshi Nayak

    Abstract: Unauthorized credit inquiries are also a central entry point for identity theft, with Social Security Numbers (SSNs) being widely utilized in fraudulent cases. Traditional credit inquiry systems do not usually possess strict user authentication, making them vulnerable to unauthorized access. This paper proposes a real-time user authorization system to enhance security by enforcing explicit user ap… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 13 pages, 7 figures

    Report number: Vol. 10 No. 35s (2025)

  11. The Hidden Dangers of Outdated Software: A Cyber Security Perspective

    Authors: Gogulakrishnan Thiyagarajan, Vinay Bist, Prabhudarshi Nayak

    Abstract: Outdated software remains a potent and underappreciated menace in 2025's cybersecurity environment, exposing systems to a broad array of threats, including ransomware, data breaches, and operational outages that can have devastating and far-reaching impacts. This essay explores the unseen threats of cyberattacks by presenting robust statistical information, including the staggering reality that 32… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  12. arXiv:2505.13448  [pdf, other

    cs.CL cs.AI

    CIE: Controlling Language Model Text Generations Using Continuous Signals

    Authors: Vinay Samuel, Harshita Diddee, Yiming Zhang, Daphne Ippolito

    Abstract: Aligning language models with user intent is becoming increasingly relevant to enhance user experience. This calls for designing methods that can allow users to control the properties of the language that LMs generate. For example, controlling the length of the generation, the complexity of the language that gets chosen, the sentiment, tone, etc. Most existing work attempts to integrate users' con… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 10 pages, 3 figures

  13. arXiv:2505.12217  [pdf, ps, other

    cs.CV

    Hyperspectral Image Land Cover Captioning Dataset for Vision Language Models

    Authors: Aryan Das, Tanishq Rachamalla, Pravendra Singh, Koushik Biswas, Vinay Kumar Verma, Swalpa Kumar Roy

    Abstract: We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications. Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations, enabling deeper semantic understanding of hyperspectral imagery.… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  14. arXiv:2505.08414  [pdf

    eess.IV cs.CV

    An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care

    Authors: Zhi Da Soh, Yang Bai, Kai Yu, Yang Zhou, Xiaofeng Lei, Sahil Thakur, Zann Lee, Lee Ching Linette Phang, Qingsheng Peng, Can Can Xue, Rachel Shujuan Chong, Quan V. Hoang, Lavanya Raghavan, Yih Chung Tham, Charumathi Sabanayagam, Wei-Chi Wu, Ming-Chih Ho, Jiangnan He, Preeti Gupta, Ecosse Lamoureux, Seang Mei Saw, Vinay Nangia, Songhomitra Panda-Jonas, Jie Xu, Ya Xing Wang , et al. (6 additional authors not shown)

    Abstract: Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptati… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  15. arXiv:2505.00432  [pdf, other

    cs.RO

    A Neural Network Mode for PX4 on Embedded Flight Controllers

    Authors: Sindre M. Hegre, Welf Rehberg, Mihir Kulkarni, Kostas Alexis

    Abstract: This paper contributes an open-sourced implementation of a neural-network based controller framework within the PX4 stack. We develop a custom module for inference on the microcontroller while retaining all of the functionality of the PX4 autopilot. Policies trained in the Aerial Gym Simulator are converted to the TensorFlow Lite format and then built together with PX4 and flashed to the flight co… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: 4 pages. Accepted to the Workshop on 25 Years of Aerial Robotics: Challenges and Opportunities (ICRA 2025)

  16. arXiv:2504.18444  [pdf, ps, other

    eess.SY cs.LG math.OC

    Boosting-Enabled Robust System Identification of Partially Observed LTI Systems Under Heavy-Tailed Noise

    Authors: Vinay Kanakeri, Aritra Mitra

    Abstract: We consider the problem of system identification of partially observed linear time-invariant (LTI) systems. Given input-output data, we provide non-asymptotic guarantees for identifying the system parameters under general heavy-tailed noise processes. Unlike previous works that assume Gaussian or sub-Gaussian noise, we consider significantly broader noise distributions that are required to admit o… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  17. arXiv:2504.13768  [pdf, other

    cs.LG cs.CE physics.comp-ph

    Equi-Euler GraphNet: An Equivariant, Temporal-Dynamics Informed Graph Neural Network for Dual Force and Trajectory Prediction in Multi-Body Systems

    Authors: Vinay Sharma, Rémi Tanguy Oddon, Pietro Tesini, Jens Ravesloot, Cees Taal, Olga Fink

    Abstract: Accurate real-time modeling of multi-body dynamical systems is essential for enabling digital twin applications across industries. While many data-driven approaches aim to learn system dynamics, jointly predicting internal loads and system trajectories remains a key challenge. This dual prediction is especially important for fault detection and predictive maintenance, where internal loads-such as… ▽ More

    Submitted 25 April, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

    Comments: Reuploaded with new version-- equation 16 was incorrect

  18. arXiv:2504.12914  [pdf, other

    cs.CY

    In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?

    Authors: Ben Bucknall, Saad Siddiqui, Lara Thurnherr, Conor McGurk, Ben Harack, Anka Reuel, Patricia Paskov, Casey Mahoney, Sören Mindermann, Scott Singer, Vinay Hiremath, Charbel-Raphaël Segerie, Oscar Delaney, Alessandro Abate, Fazl Barez, Michael K. Cohen, Philip Torr, Ferenc Huszár, Anisoara Calinescu, Gabriel Davis Jones, Yoshua Bengio, Robert Trager

    Abstract: International cooperation is common in AI research, including between geopolitical rivals. While many experts advocate for greater international cooperation on AI safety to address shared global risks, some view cooperation on AI with suspicion, arguing that it can pose unacceptable risks to national security. However, the extent to which cooperation on AI safety poses such risks, as well as provi… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Accepted to ACM Conference on Fairness, Accountability, and Transparency (FAccT 2025)

  19. arXiv:2504.12354  [pdf, other

    eess.IV cs.AI

    WaterFlow: Learning Fast & Robust Watermarks using Stable Diffusion

    Authors: Vinay Shukla, Prachee Sharma, Ryan Rossi, Sungchul Kim, Tong Yu, Aditya Grover

    Abstract: The ability to embed watermarks in images is a fundamental problem of interest for computer vision, and is exacerbated by the rapid rise of generated imagery in recent times. Current state-of-the-art techniques suffer from computational and statistical challenges such as the slow execution speed for practical deployments. In addition, other works trade off fast watermarking speeds but suffer great… ▽ More

    Submitted 17 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  20. arXiv:2504.06861  [pdf, other

    cs.CV cs.AI

    EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation

    Authors: Diljeet Jagpal, Xi Chen, Vinay P. Namboodiri

    Abstract: Zero-shot, training-free, image-based text-to-video generation is an emerging area that aims to generate videos using existing image-based diffusion models. Current methods in this space require specific architectural changes to image generation models, which limit their adaptability and scalability. In contrast to such methods, we provide a model-agnostic approach. We use intersections in diffusi… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

  21. arXiv:2504.05228  [pdf, other

    cs.CL

    NoveltyBench: Evaluating Language Models for Humanlike Diversity

    Authors: Yiming Zhang, Harshita Diddee, Susan Holm, Hanchen Liu, Xinyue Liu, Vinay Samuel, Barry Wang, Daphne Ippolito

    Abstract: Language models have demonstrated remarkable capabilities on standard benchmarks, yet they struggle increasingly from mode collapse, the inability to generate diverse and novel outputs. Our work introduces NoveltyBench, a benchmark specifically designed to evaluate the ability of language models to produce multiple distinct and high-quality outputs. NoveltyBench utilizes prompts curated to elicit… ▽ More

    Submitted 24 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  22. arXiv:2503.19090  [pdf, other

    cs.CL

    LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment

    Authors: Varsha Embar, Ritvik Shrivastava, Vinay Damodaran, Travis Mehlinger, Yu-Chung Hsiao, Karthik Raghunathan

    Abstract: Large Language Models have transformed the Contact Center industry, manifesting in enhanced self-service tools, streamlined administrative processes, and augmented agent productivity. This paper delineates our system that automates call driver generation, which serves as the foundation for tasks such as topic modeling, incoming call classification, trend detection, and FAQ generation, delivering a… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  23. arXiv:2503.14828  [pdf, other

    cs.CL cs.AI

    The CLEF-2025 CheckThat! Lab: Subjectivity, Fact-Checking, Claim Normalization, and Retrieval

    Authors: Firoj Alam, Julia Maria Struß, Tanmoy Chakraborty, Stefan Dietze, Salim Hafid, Katerina Korre, Arianna Muti, Preslav Nakov, Federico Ruggeri, Sebastian Schellhammer, Vinay Setty, Megha Sundriyal, Konstantin Todorov, Venktesh V

    Abstract: The CheckThat! lab aims to advance the development of innovative technologies designed to identify and counteract online disinformation and manipulation efforts across various languages and platforms. The first five editions focused on key tasks in the information verification pipeline, including check-worthiness, evidence retrieval and pairing, and verification. Since the 2023 edition, the lab ha… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: misinformation, factuality, fact-checking, fact-checkers, check-worthiness, Social Media Platforms

    MSC Class: 68T50 ACM Class: I.2; I.2.7

  24. arXiv:2503.07506  [pdf, other

    cs.LG cs.CV

    ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning

    Authors: Soumya Banerjee, Vinay Kumar Verma

    Abstract: Active learning aims to select optimal samples for labeling, minimizing annotation costs. This paper introduces a unified representation learning framework tailored for active learning with task awareness. It integrates diverse sources, comprising reconstruction, adversarial, self-supervised, knowledge-distillation, and classification losses into a unified VAE-based ADROIT approach. The proposed a… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  25. arXiv:2503.06296  [pdf, other

    cs.CL cs.LG

    MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering

    Authors: Vinay Kumar Verma, Shreyas Sunil Kulkarni, Happy Mittal, Deepak Gupta

    Abstract: Question Answering (QA) and Visual Question Answering (VQA) are well-studied problems in the language and vision domain. One challenging scenario involves multiple sources of information, each of a different modality, where the answer to the question may exist in one or more sources. This scenario contains richer information but is highly complex to handle. In this work, we formulate a novel quest… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

    Comments: To appear at NAACL Industry Track

  26. Organize, Then Vote: Exploring Cognitive Load in Quadratic Survey Interfaces

    Authors: Ti-Chung Cheng, Yutong Zhang, Yi-Hung Chou, Vinay Koshy, Tiffany Wenting Li, Karrie Karahalios, Hari Sundaram

    Abstract: Quadratic Surveys (QSs) elicit more accurate preferences than traditional methods like Likert-scale surveys. However, the cognitive load associated with QSs has hindered their adoption in digital surveys for collective decision-making. We introduce a two-phase "organize-then-vote" QS to reduce cognitive load. As interface design significantly impacts survey results and accuracy, our design scaffol… ▽ More

    Submitted 16 May, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    ACM Class: H.5.2

    Journal ref: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI '25), Article 475, 35 pages, ACM, New York, NY, USA

  27. arXiv:2503.01471  [pdf, other

    cs.RO

    Aerial Gym Simulator: A Framework for Highly Parallelized Simulation of Aerial Robots

    Authors: Mihir Kulkarni, Welf Rehberg, Kostas Alexis

    Abstract: This paper contributes the Aerial Gym Simulator, a highly parallelized, modular framework for simulation and rendering of arbitrary multirotor platforms based on NVIDIA Isaac Gym. Aerial Gym supports the simulation of under-, fully- and over-actuated multirotors offering parallelized geometric controllers, alongside a custom GPU-accelerated rendering framework for ray-casting capable of capturing… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)

  28. arXiv:2502.14145  [pdf, other

    cs.CL eess.AS

    LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems

    Authors: Hao Zhang, Weiwei Li, Rilin Chen, Vinay Kothapally, Meng Yu, Dong Yu

    Abstract: Achieving full-duplex communication in spoken dialogue systems (SDS) requires real-time coordination between listening, speaking, and thinking. This paper proposes a semantic voice activity detection (VAD) module as a dialogue manager (DM) to efficiently manage turn-taking in full-duplex SDS. Implemented as a lightweight (0.5B) LLM fine-tuned on full-duplex conversation data, the semantic VAD pred… ▽ More

    Submitted 24 February, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: In submission to INTERSPEECH 2025

  29. arXiv:2502.10481  [pdf, other

    cs.LG

    Chronic Diseases Prediction Using ML

    Authors: Sri Varsha Mulakala, G. Neeharika, P. Vinay Kumar, A. Bhargava Kiran

    Abstract: The recent increase in morbidity is primarily due to chronic diseases including Diabetes, Heart disease, Lung cancer, and brain tumours. The results for patients can be improved, and the financial burden on the healthcare system can be lessened, through the early detection and prevention of certain disorders. In this study, we built a machine-learning model for predicting the existence of numerous… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  30. arXiv:2502.06006  [pdf, other

    cs.IR

    FactIR: A Real-World Zero-shot Open-Domain Retrieval Benchmark for Fact-Checking

    Authors: Venktesh V, Vinay Setty

    Abstract: The field of automated fact-checking increasingly depends on retrieving web-based evidence to determine the veracity of claims in real-world scenarios. A significant challenge in this process is not only retrieving relevant information, but also identifying evidence that can both support and refute complex claims. Traditional retrieval methods may return documents that directly address claims or l… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted to WWW 2025 resource track

  31. arXiv:2502.05803  [pdf, other

    cs.IR

    FlashCheck: Exploration of Efficient Evidence Retrieval for Fast Fact-Checking

    Authors: Kevin Nanekhan, Venktesh V, Erik Martin, Henrik Vatndal, Vinay Setty, Avishek Anand

    Abstract: The advances in digital tools have led to the rampant spread of misinformation. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. It is essential for automated fact-checking to be efficient for aiding in combating misinformation in real-time and at the source. Fact-checking pipelines primarily comprise a knowledge retrieval component which extracts relev… ▽ More

    Submitted 16 February, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted to ECIR 2025, 15 pages

  32. arXiv:2502.04695  [pdf, other

    cs.AI cs.CE cs.ET cs.LG

    Bridging the Gap in XAI-Why Reliable Metrics Matter for Explainability and Compliance

    Authors: Pratinav Seth, Vinay Kumar Sankarapu

    Abstract: This position paper emphasizes the critical gap in the evaluation of Explainable AI (XAI) due to the lack of standardized and reliable metrics, which diminishes its practical value, trustworthiness, and ability to meet regulatory requirements. Current evaluation methods are often fragmented, subjective, and biased, making them prone to manipulation and complicating the assessment of complex models… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  33. arXiv:2502.03014  [pdf, other

    cs.LG cs.AI cs.ET

    xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods

    Authors: Pratinav Seth, Yashwardhan Rathore, Neeraj Kumar Singh, Chintan Chitroda, Vinay Kumar Sankarapu

    Abstract: The growing complexity of machine learning and deep learning models has led to an increased reliance on opaque "black box" systems, making it difficult to understand the rationale behind predictions. This lack of transparency is particularly challenging in high-stakes applications where interpretability is as important as accuracy. Post-hoc explanation methods are commonly used to interpret these… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  34. arXiv:2502.01956  [pdf, other

    cs.RO cs.AI cs.LG

    DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents

    Authors: Shashank Sharma, Janina Hoffmann, Vinay Namboodiri

    Abstract: Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goal… ▽ More

    Submitted 27 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  35. arXiv:2502.01653  [pdf, other

    quant-ph cs.NI

    Quantum Internet: Technologies, Protocols, and Research Challenges

    Authors: Vinay Kumar, Claudio Cicconetti, Marco Conti, Andrea Passarella

    Abstract: As the field of the quantum internet advances, a comprehensive guide to navigate its complexities has become increasingly crucial. While quantum computing shares foundational principles with the quantum internet, distinguishing between the two is essential for further development and deeper understanding. This work systematically introduces the quantum internet by discussing its importance, core c… ▽ More

    Submitted 18 March, 2025; v1 submitted 30 January, 2025; originally announced February 2025.

    Comments: 50 pages, 11 figures

  36. arXiv:2502.01402  [pdf, other

    cs.CL

    Annotation Tool and Dataset for Fact-Checking Podcasts

    Authors: Vinay Setty, Adam James Becker

    Abstract: Podcasts are a popular medium on the web, featuring diverse and multilingual content that often includes unverified claims. Fact-checking podcasts is a challenging task, requiring transcription, annotation, and claim verification, all while preserving the contextual details of spoken content. Our tool offers a novel approach to tackle these challenges by enabling real-time annotation of podcasts d… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: Accepted as resource paper in TheWebConf 2025

  37. arXiv:2501.15486  [pdf, other

    cs.LG cs.AI cs.CV cs.DC

    FedAlign: Federated Domain Generalization with Cross-Client Feature Alignment

    Authors: Sunny Gupta, Vinay Sutar, Varunav Singh, Amit Sethi

    Abstract: Federated Learning (FL) offers a decentralized paradigm for collaborative model training without direct data sharing, yet it poses unique challenges for Domain Generalization (DG), including strict privacy constraints, non-i.i.d. local data, and limited domain diversity. We introduce FedAlign, a lightweight, privacy-preserving framework designed to enhance DG in federated settings by simultaneousl… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

    Comments: 9 pages, 4 figures

    ACM Class: I.2.6; C.1.4; D.1.3; I.5.1; H.3.4; I.2.10; I.4.0; I.4.1; I.4.2; I.4.6; I.4.7; I.4.8; I.4.9; I.4.10; I.5.1; I.5.2; I.5.4; J.2; I.2.11; I.2.10

  38. arXiv:2501.12016  [pdf

    cs.CV cs.LG

    Are Traditional Deep Learning Model Approaches as Effective as a Retinal-Specific Foundation Model for Ocular and Systemic Disease Detection?

    Authors: Samantha Min Er Yew, Xiaofeng Lei, Jocelyn Hui Lin Goh, Yibing Chen, Sahana Srinivasan, Miao-li Chee, Krithi Pushpanathan, Ke Zou, Qingshan Hou, Zhi Da Soh, Cancan Xue, Marco Chak Yan Yu, Charumathi Sabanayagam, E Shyong Tai, Xueling Sim, Yaxing Wang, Jost B. Jonas, Vinay Nangia, Gabriel Dawei Yang, Emma Anran Ran, Carol Yim-Lui Cheung, Yangqin Feng, Jun Zhou, Rick Siow Mong Goh, Yukun Zhou , et al. (4 additional authors not shown)

    Abstract: Background: RETFound, a self-supervised, retina-specific foundation model (FM), showed potential in downstream applications. However, its comparative performance with traditional deep learning (DL) models remains incompletely understood. This study aimed to evaluate RETFound against three ImageNet-pretrained supervised DL models (ResNet50, ViT-base, SwinV2) in detecting ocular and systemic disease… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  39. arXiv:2501.07857  [pdf, other

    cs.SE cs.AI

    Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs

    Authors: Nilesh Dhulshette, Sapan Shah, Vinay Kulkarni

    Abstract: In large-scale software development, understanding the functionality and intent behind complex codebases is critical for effective development and maintenance. While code summarization has been widely studied, existing methods primarily focus on smaller code units, such as functions, and struggle with larger code artifacts like files and packages. Additionally, current summarization models tend to… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: To appear at LLM4Code@ICSE 2025

  40. arXiv:2501.07373  [pdf, other

    cs.LG cs.CE physics.comp-ph

    Dynami-CAL GraphNet: A Physics-Informed Graph Neural Network Conserving Linear and Angular Momentum for Dynamical Systems

    Authors: Vinay Sharma, Olga Fink

    Abstract: Accurate, interpretable, and real-time modeling of multi-body dynamical systems is essential for predicting behaviors and inferring physical properties in natural and engineered environments. Traditional physics-based models face scalability challenges and are computationally demanding, while data-driven approaches like Graph Neural Networks (GNNs) often lack physical consistency, interpretability… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  41. arXiv:2501.03839  [pdf, other

    eess.IV cs.CV

    MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention

    Authors: Aadya Arora, Vinay Namboodiri

    Abstract: With the popularity of foundational models, parameter efficient fine tuning has become the defacto approach to leverage pretrained models to perform downstream tasks. Taking inspiration from recent advances in large language models, Visual Prompt Tuning, and similar techniques, learn an additional prompt to efficiently finetune a pretrained vision foundational model. However, we observe that such… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

  42. arXiv:2501.00421  [pdf, ps, other

    eess.SY cs.LG math.OC

    Outlier-Robust Linear System Identification Under Heavy-tailed Noise

    Authors: Vinay Kanakeri, Aritra Mitra

    Abstract: We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system, given access to multiple independent trajectories sampled from the system. Several recent papers have conducted a non-asymptotic analysis of this problem, relying crucially on the assumption that the process noise is either Gaussian or sub-Gaussian, i.e., "light-tailed". In sharp contrast, we… ▽ More

    Submitted 27 May, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

  43. arXiv:2412.18775  [pdf, other

    cs.CV cs.AI cs.LG

    ObitoNet: Multimodal High-Resolution Point Cloud Reconstruction

    Authors: Apoorv Thapliyal, Vinay Lanka, Swathi Baskaran

    Abstract: ObitoNet employs a Cross Attention mechanism to integrate multimodal inputs, where Vision Transformers (ViT) extract semantic features from images and a point cloud tokenizer processes geometric information using Farthest Point Sampling (FPS) and K Nearest Neighbors (KNN) for spatial structure capture. The learned multimodal features are fed into a transformer-based decoder for high-resolution poi… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  44. arXiv:2412.17304  [pdf, other

    cs.AI

    On the Feasibility of Vision-Language Models for Time-Series Classification

    Authors: Vinay Prithyani, Mohsin Mohammed, Richa Gadgil, Ricardo Buitrago, Vinija Jain, Aman Chadha

    Abstract: We build upon time-series classification by leveraging the capabilities of Vision Language Models (VLMs). We find that VLMs produce competitive results after two or less epochs of fine-tuning. We develop a novel approach that incorporates graphical data representations as images in conjunction with numerical data. This approach is rooted in the hypothesis that graphical representations can provide… ▽ More

    Submitted 17 January, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

  45. arXiv:2412.15701  [pdf, other

    cs.AI cs.CL cs.HC

    Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

    Authors: Yijia Shao, Vinay Samuel, Yucheng Jiang, John Yang, Diyi Yang

    Abstract: Recent advancements in language models (LMs) have sparked growing interest in developing LM agents. While fully autonomous agents could excel in many scenarios, numerous use cases inherently require them to collaborate with humans due to humans' latent preferences, domain expertise, or need for control. To facilitate the study of human-agent collaboration, we present Collaborative Gym (Co-Gym), a… ▽ More

    Submitted 16 January, 2025; v1 submitted 20 December, 2024; originally announced December 2024.

    Comments: Preprint. Work in progress

  46. arXiv:2412.07739  [pdf, other

    cs.CV cs.AI cs.GR

    GASP: Gaussian Avatars with Synthetic Priors

    Authors: Jack Saunders, Charlie Hewitt, Yanan Jian, Marek Kowalski, Tadas Baltrusaitis, Yiye Chen, Darren Cosker, Virginia Estellers, Nicholas Gyde, Vinay P. Namboodiri, Benjamin E Lundell

    Abstract: Gaussian Splatting has changed the game for real-time photo-realistic rendering. One of the most popular applications of Gaussian Splatting is to create animatable avatars, known as Gaussian Avatars. Recent works have pushed the boundaries of quality and rendering efficiency but suffer from two main limitations. Either they require expensive multi-camera rigs to produce avatars with free-view rend… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: Project page: https://microsoft.github.io/GASP/

  47. arXiv:2412.04259  [pdf, other

    cs.CR cs.LG

    SCADE: Scalable Framework for Anomaly Detection in High-Performance System

    Authors: Vaishali Vinay, Anjali Mangal

    Abstract: As command-line interfaces remain integral to high-performance computing environments, the risk of exploitation through stealthy and complex command-line abuse grows. Conventional security solutions struggle to detect these anomalies due to their context-specific nature, lack of labeled data, and the prevalence of sophisticated attacks like Living-off-the-Land (LOL). To address this gap, we introd… ▽ More

    Submitted 9 December, 2024; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: Updated title and abstract for broader scope. Submitted to ACM CODASPY (The 15th ACM Conference on Data and Application Security and Privacy) Conference

  48. arXiv:2412.01150  [pdf

    astro-ph.HE astro-ph.IM cs.AI cs.LG

    Representation Learning for Time-Domain High-Energy Astrophysics: Discovery of Extragalactic Fast X-ray Transient XRT 200515

    Authors: Steven Dillmann, Juan Rafael Martínez-Galarza, Roberto Soria, Rosanne Di Stefano, Vinay L. Kashyap

    Abstract: We present a novel representation learning method for downstream tasks like anomaly detection, unsupervised classification, and similarity searches in high-energy data sets. This enabled the discovery of a new extragalactic fast X-ray transient (FXT) in Chandra archival data, XRT 200515, a needle-in-the-haystack event and the first Chandra FXT of its kind. Recent serendipitous discoveries in X-ray… ▽ More

    Submitted 3 March, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: 25 pages, accepted in Monthly Notices of the Royal Astronomical Society

    Journal ref: Monthly Notices of the Royal Astronomical Society, Volume 537, Issue 2, February 2025

  49. arXiv:2411.12643  [pdf, other

    cs.LG cs.AI cs.CL

    DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models

    Authors: Vinay Kumar Sankarapu, Chintan Chitroda, Yashwardhan Rathore, Neeraj Kumar Singh, Pratinav Seth

    Abstract: The rapid growth of AI has led to more complex deep learning models, often operating as opaque "black boxes" with limited transparency in their decision-making. This lack of interpretability poses challenges, especially in high-stakes applications where understanding model output is crucial. This work highlights the importance of interpretability in fostering trust, accountability, and responsible… ▽ More

    Submitted 4 February, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

  50. arXiv:2411.12074  [pdf, other

    cs.CL cs.LG

    Mitigating Gender Bias in Contextual Word Embeddings

    Authors: Navya Yarrabelly, Vinay Damodaran, Feng-Guang Su

    Abstract: Word embeddings have been shown to produce remarkable results in tackling a vast majority of NLP related tasks. Unfortunately, word embeddings also capture the stereotypical biases that are prevalent in society, affecting the predictive performance of the embeddings when used in downstream tasks. While various techniques have been proposed \cite{bolukbasi2016man, zhao2018learning} and criticized\c… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.