-
A Community-driven vision for a new Knowledge Resource for AI
Authors:
Vinay K Chaudhri,
Chaitan Baru,
Brandon Bennett,
Mehul Bhatt,
Darion Cassel,
Anthony G Cohn,
Rina Dechter,
Esra Erdem,
Dave Ferrucci,
Ken Forbus,
Gregory Gelfond,
Michael Genesereth,
Andrew S. Gordon,
Benjamin Grosof,
Gopal Gupta,
Jim Hendler,
Sharat Israni,
Tyler R. Josephson,
Patrick Kyllonen,
Yuliya Lierler,
Vladimir Lifschitz,
Clifton McFate,
Hande K. McGinty,
Leora Morgenstern,
Alessandro Oltramari
, et al. (7 additional authors not shown)
Abstract:
The long-standing goal of creating a comprehensive, multi-purpose knowledge resource, reminiscent of the 1984 Cyc project, still persists in AI. Despite the success of knowledge resources like WordNet, ConceptNet, Wolfram|Alpha and other commercial knowledge graphs, verifiable, general-purpose widely available sources of knowledge remain a critical deficiency in AI infrastructure. Large language m…
▽ More
The long-standing goal of creating a comprehensive, multi-purpose knowledge resource, reminiscent of the 1984 Cyc project, still persists in AI. Despite the success of knowledge resources like WordNet, ConceptNet, Wolfram|Alpha and other commercial knowledge graphs, verifiable, general-purpose widely available sources of knowledge remain a critical deficiency in AI infrastructure. Large language models struggle due to knowledge gaps; robotic planning lacks necessary world knowledge; and the detection of factually false information relies heavily on human expertise. What kind of knowledge resource is most needed in AI today? How can modern technology shape its development and evaluation? A recent AAAI workshop gathered over 50 researchers to explore these questions. This paper synthesizes our findings and outlines a community-driven vision for a new knowledge infrastructure. In addition to leveraging contemporary advances in knowledge representation and reasoning, one promising idea is to build an open engineering framework to exploit knowledge modules effectively within the context of practical applications. Such a framework should include sets of conventions and social structures that are adopted by contributors.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Agile, Autonomous Spacecraft Constellations with Disruption Tolerant Networking to Monitor Precipitation and Urban Floods
Authors:
Sreeja Roy-Singh,
Alan P. Li,
Vinay Ravindra,
Roderick Lammers,
Marc Sanchez Net
Abstract:
Fully re-orientable small spacecraft are now supported by commercial technologies, allowing them to point their instruments in any direction and capture images, with short notice. When combined with improved onboard processing, and implemented on a constellation of inter-communicable satellites, this intelligent agility can significantly increase responsiveness to transient or evolving phenomena.…
▽ More
Fully re-orientable small spacecraft are now supported by commercial technologies, allowing them to point their instruments in any direction and capture images, with short notice. When combined with improved onboard processing, and implemented on a constellation of inter-communicable satellites, this intelligent agility can significantly increase responsiveness to transient or evolving phenomena. We demonstrate a ground-based and onboard algorithmic framework that combines orbital mechanics, attitude control, inter-satellite communication, intelligent prediction and planning to schedule the time-varying, re-orientation of agile, small satellites in a constellation. Planner intelligence is improved by updating the predictive value of future space-time observations based on shared observations of evolving episodic precipitation and urban flood forecasts. Reliable inter-satellite communication within a fast, dynamic constellation topology is modeled in the physical, access control and network layer. We apply the framework on a representative 24-satellite constellation observing 5 global regions. Results show appropriately low latency in information exchange (average within 1/3rd available time for implicit consensus), enabling the onboard scheduler to observe ~7% more flood magnitude than a ground-based implementation. Both onboard and offline versions performed ~98% better than constellations without agility.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
Authors:
Haoran Zhou,
Xingchen Song,
Brendan Fahy,
Qiaochu Song,
Binbin Zhang,
Zhendong Peng,
Anshul Wadhawan,
Denglin Jiang,
Apurv Verma,
Vinay Ramesh,
Srivas Prasad,
Michele M. Franceschini
Abstract:
OpenAI Whisper is a family of robust Automatic Speech Recognition (ASR) models trained on 680,000 hours of audio. However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) structure. We introduce an additional Conn…
▽ More
OpenAI Whisper is a family of robust Automatic Speech Recognition (ASR) models trained on 680,000 hours of audio. However, its encoder-decoder architecture, trained with a sequence-to-sequence objective, lacks native support for streaming ASR. In this paper, we fine-tune Whisper for streaming ASR using the WeNet toolkit by adopting a Unified Two-pass (U2) structure. We introduce an additional Connectionist Temporal Classification (CTC) decoder trained with causal attention masks to generate streaming partial transcripts, while the original Whisper decoder reranks these partial outputs. Our experiments on LibriSpeech and an earnings call dataset demonstrate that, with adequate fine-tuning data, Whisper can be adapted into a capable streaming ASR model. We also introduce a hybrid tokenizer approach, which uses a smaller token space for the CTC decoder while retaining Whisper's original token space for the attention decoder, resulting in improved data efficiency and generalization.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
Adaptive Bandwidth Sharing for Optimizing QoE of Real-Time Video
Authors:
Sushi Anna George,
Vinay Joseph
Abstract:
The concept of spectrum or bandwidth sharing has gained significant global attention as a means to enhance the efficiency of real-time traffic management in wireless networks. Effective bandwidth sharing enables optimal utilization of available resources, reducing congestion and improving QoE for delay-sensitive applications such as real-time video transmission. In this paper, we propose a novel i…
▽ More
The concept of spectrum or bandwidth sharing has gained significant global attention as a means to enhance the efficiency of real-time traffic management in wireless networks. Effective bandwidth sharing enables optimal utilization of available resources, reducing congestion and improving QoE for delay-sensitive applications such as real-time video transmission. In this paper, we propose a novel iterative semi-static bandwidth sharing policy that balances the advantages of both static and dynamic sharing approaches. Our approach minimizes the frequency of coordination between network operators while ensuring efficient resource allocation and meeting the stringent QoE demands of real-time traffic. The proposed policy iteratively optimizes both the spectrum sharing between operators and the resource allocation for individual clients. We establish strong theoretical guarantees for the optimality of the proposed policy and prove that it converges to the optimal static sharing policy irrespective of initial conditions or fluctuations in traffic arrival rates. Additionally, we conduct extensive simulations to evaluate the impact of key system parameters - including step size, hyperperiod length, and arrival process dynamics - on the performance of our policy. Our results demonstrate the effectiveness of the proposed approach in achieving near-optimal bandwidth allocation with reduced overhead, making it a practical solution for real-time wireless applications.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Survey on the Evaluation of Generative Models in Music
Authors:
Alexander Lerch,
Claire Arthur,
Nick Bryan-Kinns,
Corey Ford,
Qianyi Sun,
Ashvala Vinay
Abstract:
Research on generative systems in music has seen considerable attention and growth in recent years. A variety of attempts have been made to systematically evaluate such systems. We provide an interdisciplinary review of the common evaluation targets, methodologies, and metrics for the evaluation of both system output and model usability, covering subjective and objective approaches, qualitative an…
▽ More
Research on generative systems in music has seen considerable attention and growth in recent years. A variety of attempts have been made to systematically evaluate such systems. We provide an interdisciplinary review of the common evaluation targets, methodologies, and metrics for the evaluation of both system output and model usability, covering subjective and objective approaches, qualitative and quantitative approaches, as well as empirical and computational methods. We discuss the advantages and challenges of such approaches from a musicological, an engineering, and an HCI perspective.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering
Authors:
Vinay Joshi,
Pratik Prabhanjan Brahma,
Zicheng Liu,
Emad Barsoum
Abstract:
The key-value (KV) cache in transformer models is a critical component for efficient decoding or inference, yet its memory demands scale poorly with sequence length, posing a major challenge for scalable deployment of large language models. Among several approaches to KV cache compression, quantization of key and value activations has been widely explored. Most KV cache quantization methods still…
▽ More
The key-value (KV) cache in transformer models is a critical component for efficient decoding or inference, yet its memory demands scale poorly with sequence length, posing a major challenge for scalable deployment of large language models. Among several approaches to KV cache compression, quantization of key and value activations has been widely explored. Most KV cache quantization methods still need to manage sparse and noncontiguous outliers separately. To address this, we introduce TaDA, a training-free recipe for KV cache compression with quantization precision that adapts to error sensitivity across layers and a mean centering to eliminate separate outlier handling. Our approach yields substantial accuracy improvements for multiple models supporting various context lengths. Moreover, our approach does not need to separately manage outlier elements -- a persistent hurdle in most traditional quantization methods. Experiments on standard benchmarks demonstrate that our technique reduces KV cache memory footprint to 27% of the original 16-bit baseline while achieving comparable accuracy. Our method paves the way for scalable and high-performance reasoning in language models by potentially enabling inference for longer context length models, reasoning models, and longer chain of thoughts.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
BEAR: BGP Event Analysis and Reporting
Authors:
Hanqing Li,
Melania Fedeli,
Vinay Kolar,
Diego Klabjan
Abstract:
The Internet comprises of interconnected, independently managed Autonomous Systems (AS) that rely on the Border Gateway Protocol (BGP) for inter-domain routing. BGP anomalies--such as route leaks and hijacks--can divert traffic through unauthorized or inefficient paths, jeopardizing network reliability and security. Although existing rule-based and machine learning methods can detect these anomali…
▽ More
The Internet comprises of interconnected, independently managed Autonomous Systems (AS) that rely on the Border Gateway Protocol (BGP) for inter-domain routing. BGP anomalies--such as route leaks and hijacks--can divert traffic through unauthorized or inefficient paths, jeopardizing network reliability and security. Although existing rule-based and machine learning methods can detect these anomalies using structured metrics, they still require experts with in-depth BGP knowledge of, for example, AS relationships and historical incidents, to interpret events and propose remediation. In this paper, we introduce BEAR (BGP Event Analysis and Reporting), a novel framework that leverages large language models (LLMs) to automatically generate comprehensive reports explaining detected BGP anomaly events. BEAR employs a multi-step reasoning process that translates tabular BGP data into detailed textual narratives, enhancing interpretability and analytical precision. To address the limited availability of publicly documented BGP anomalies, we also present a synthetic data generation framework powered by LLMs. Evaluations on both real and synthetic datasets demonstrate that BEAR achieves 100% accuracy, outperforming Chain-of-Thought and in-context learning baselines. This work pioneers an automated approach for explaining BGP anomaly events, offering valuable operational insights for network management.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
MRSD: Multi-Resolution Skill Discovery for HRL Agents
Authors:
Shashank Sharma,
Janina Hoffmann,
Vinay Namboodiri
Abstract:
Hierarchical reinforcement learning (HRL) relies on abstract skills to solve long-horizon tasks efficiently. While existing skill discovery methods learns these skills automatically, they are limited to a single skill per task. In contrast, humans learn and use both fine-grained and coarse motor skills simultaneously. Inspired by human motor control, we propose Multi-Resolution Skill Discovery (MR…
▽ More
Hierarchical reinforcement learning (HRL) relies on abstract skills to solve long-horizon tasks efficiently. While existing skill discovery methods learns these skills automatically, they are limited to a single skill per task. In contrast, humans learn and use both fine-grained and coarse motor skills simultaneously. Inspired by human motor control, we propose Multi-Resolution Skill Discovery (MRSD), an HRL framework that learns multiple skill encoders at different temporal resolutions in parallel. A high-level manager dynamically selects among these skills, enabling adaptive control strategies over time. We evaluate MRSD on tasks from the DeepMind Control Suite and show that it outperforms prior state-of-the-art skill discovery and HRL methods, achieving faster convergence and higher final performance. Our findings highlight the benefits of integrating multi-resolution skills in HRL, paving the way for more versatile and efficient agents.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Securing Credit Inquiries: The Role of Real-Time User Approval in Preventing SSN Identity Theft
Authors:
Gogulakrishnan Thiyagarajan,
Vinay Bist,
Prabhudarshi Nayak
Abstract:
Unauthorized credit inquiries are also a central entry point for identity theft, with Social Security Numbers (SSNs) being widely utilized in fraudulent cases. Traditional credit inquiry systems do not usually possess strict user authentication, making them vulnerable to unauthorized access. This paper proposes a real-time user authorization system to enhance security by enforcing explicit user ap…
▽ More
Unauthorized credit inquiries are also a central entry point for identity theft, with Social Security Numbers (SSNs) being widely utilized in fraudulent cases. Traditional credit inquiry systems do not usually possess strict user authentication, making them vulnerable to unauthorized access. This paper proposes a real-time user authorization system to enhance security by enforcing explicit user approval before processing any credit inquiry. The system employs real-time verification and approval techniques. This ensures that the authorized user only approves or rejects a credit check request. It minimizes the risks of interference by third parties. Apart from enhancing security, this system complies with regulations like the General Data Protection Regulation (GDPR) and the Fair Credit Reporting Act (FCRA) while maintaining a seamless user experience. This article discusses the technical issues, scaling-up issues, and ways of implementing real-time user authorization in financial systems. Through this framework, financial institutions can drastically minimize the risk of identity theft, avert unauthorized credit checks, and increase customer trust in the credit verification system.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
The Hidden Dangers of Outdated Software: A Cyber Security Perspective
Authors:
Gogulakrishnan Thiyagarajan,
Vinay Bist,
Prabhudarshi Nayak
Abstract:
Outdated software remains a potent and underappreciated menace in 2025's cybersecurity environment, exposing systems to a broad array of threats, including ransomware, data breaches, and operational outages that can have devastating and far-reaching impacts. This essay explores the unseen threats of cyberattacks by presenting robust statistical information, including the staggering reality that 32…
▽ More
Outdated software remains a potent and underappreciated menace in 2025's cybersecurity environment, exposing systems to a broad array of threats, including ransomware, data breaches, and operational outages that can have devastating and far-reaching impacts. This essay explores the unseen threats of cyberattacks by presenting robust statistical information, including the staggering reality that 32% of cyberattacks exploit unpatched software vulnerabilities, based on a 2025 TechTarget survey. Furthermore, it discusses real case studies, including the MOVEit breach in 2023 and the Log4Shell breach in 2021, both of which illustrate the catastrophic consequences of failing to perform software updates. The article offers a detailed analysis of the nature of software vulnerabilities, the underlying reasons for user resistance to patches, and organizational barriers that compound the issue. Furthermore, it suggests actionable solutions, including automation and awareness campaigns, to address these shortcomings. Apart from this, the paper also talks of trends such as AI-driven vulnerability patching and legal consequences of non-compliance under laws like HIPAA, thus providing a futuristic outlook on how such advancements may define future defenses. Supplemented by tables like one detailing trends in vulnerability and a graph illustrating technology adoption, this report showcases the pressing demand for anticipatory update strategies to safeguard digital ecosystems against the constantly evolving threats that characterize the modern cyber landscape. As it stands, it is a very useful document for practitioners, policymakers, and researchers.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
CIE: Controlling Language Model Text Generations Using Continuous Signals
Authors:
Vinay Samuel,
Harshita Diddee,
Yiming Zhang,
Daphne Ippolito
Abstract:
Aligning language models with user intent is becoming increasingly relevant to enhance user experience. This calls for designing methods that can allow users to control the properties of the language that LMs generate. For example, controlling the length of the generation, the complexity of the language that gets chosen, the sentiment, tone, etc. Most existing work attempts to integrate users' con…
▽ More
Aligning language models with user intent is becoming increasingly relevant to enhance user experience. This calls for designing methods that can allow users to control the properties of the language that LMs generate. For example, controlling the length of the generation, the complexity of the language that gets chosen, the sentiment, tone, etc. Most existing work attempts to integrate users' control by conditioning LM generations on natural language prompts or discrete control signals, which are often brittle and hard to scale. In this work, we are interested in \textit{continuous} control signals, ones that exist along a spectrum that can't easily be captured in a natural language prompt or via existing techniques in conditional generation. Through a case study in controlling the precise response-length of generations produced by LMs, we demonstrate how after fine-tuning, behaviors of language models can be controlled via continuous signals -- as vectors that are interpolated between a "low" and a "high" token embedding. Our method more reliably exerts response-length control than in-context learning methods or fine-tuning methods that represent the control signal as a discrete signal. Our full open-sourced code and datasets are available at https://github.com/vsamuel2003/CIE.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Hyperspectral Image Land Cover Captioning Dataset for Vision Language Models
Authors:
Aryan Das,
Tanishq Rachamalla,
Pravendra Singh,
Koushik Biswas,
Vinay Kumar Verma,
Swalpa Kumar Roy
Abstract:
We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications. Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations, enabling deeper semantic understanding of hyperspectral imagery.…
▽ More
We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications. Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations, enabling deeper semantic understanding of hyperspectral imagery. This dataset enhances model performance in tasks like classification and feature extraction, providing a valuable resource for advanced remote sensing applications. HyperCap is constructed from four benchmark datasets and annotated through a hybrid approach combining automated and manual methods to ensure accuracy and consistency. Empirical evaluations using state-of-the-art encoders and diverse fusion techniques demonstrate significant improvements in classification performance. These results underscore the potential of vision-language learning in HSI and position HyperCap as a foundational dataset for future research in the field.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care
Authors:
Zhi Da Soh,
Yang Bai,
Kai Yu,
Yang Zhou,
Xiaofeng Lei,
Sahil Thakur,
Zann Lee,
Lee Ching Linette Phang,
Qingsheng Peng,
Can Can Xue,
Rachel Shujuan Chong,
Quan V. Hoang,
Lavanya Raghavan,
Yih Chung Tham,
Charumathi Sabanayagam,
Wei-Chi Wu,
Ming-Chih Ho,
Jiangnan He,
Preeti Gupta,
Ecosse Lamoureux,
Seang Mei Saw,
Vinay Nangia,
Songhomitra Panda-Jonas,
Jie Xu,
Ya Xing Wang
, et al. (6 additional authors not shown)
Abstract:
Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptati…
▽ More
Current deep learning models are mostly task specific and lack a user-friendly interface to operate. We present Meta-EyeFM, a multi-function foundation model that integrates a large language model (LLM) with vision foundation models (VFMs) for ocular disease assessment. Meta-EyeFM leverages a routing mechanism to enable accurate task-specific analysis based on text queries. Using Low Rank Adaptation, we fine-tuned our VFMs to detect ocular and systemic diseases, differentiate ocular disease severity, and identify common ocular signs. The model achieved 100% accuracy in routing fundus images to appropriate VFMs, which achieved $\ge$ 82.2% accuracy in disease detection, $\ge$ 89% in severity differentiation, $\ge$ 76% in sign identification. Meta-EyeFM was 11% to 43% more accurate than Gemini-1.5-flash and ChatGPT-4o LMMs in detecting various eye diseases and comparable to an ophthalmologist. This system offers enhanced usability and diagnostic performance, making it a valuable decision support tool for primary eye care or an online LLM for fundus evaluation.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
A Neural Network Mode for PX4 on Embedded Flight Controllers
Authors:
Sindre M. Hegre,
Welf Rehberg,
Mihir Kulkarni,
Kostas Alexis
Abstract:
This paper contributes an open-sourced implementation of a neural-network based controller framework within the PX4 stack. We develop a custom module for inference on the microcontroller while retaining all of the functionality of the PX4 autopilot. Policies trained in the Aerial Gym Simulator are converted to the TensorFlow Lite format and then built together with PX4 and flashed to the flight co…
▽ More
This paper contributes an open-sourced implementation of a neural-network based controller framework within the PX4 stack. We develop a custom module for inference on the microcontroller while retaining all of the functionality of the PX4 autopilot. Policies trained in the Aerial Gym Simulator are converted to the TensorFlow Lite format and then built together with PX4 and flashed to the flight controller. The policies substitute the control-cascade within PX4 to offer an end-to-end position-setpoint tracking controller directly providing normalized motor RPM setpoints. Experiments conducted in simulation and the real-world show similar tracking performance. We thus provide a flight-ready pipeline for testing neural control policies in the real world. The pipeline simplifies the deployment of neural networks on embedded flight controller hardware thereby accelerating research on learning-based control. Both the Aerial Gym Simulator and the PX4 module are open-sourced at https://github.com/ntnu-arl/aerial_gym_simulator and https://github.com/SindreMHegre/PX4-Autopilot-public/tree/for_paper. Video: https://youtu.be/lY1OKz_UOqM?si=VtzL243BAY3lblTJ.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Boosting-Enabled Robust System Identification of Partially Observed LTI Systems Under Heavy-Tailed Noise
Authors:
Vinay Kanakeri,
Aritra Mitra
Abstract:
We consider the problem of system identification of partially observed linear time-invariant (LTI) systems. Given input-output data, we provide non-asymptotic guarantees for identifying the system parameters under general heavy-tailed noise processes. Unlike previous works that assume Gaussian or sub-Gaussian noise, we consider significantly broader noise distributions that are required to admit o…
▽ More
We consider the problem of system identification of partially observed linear time-invariant (LTI) systems. Given input-output data, we provide non-asymptotic guarantees for identifying the system parameters under general heavy-tailed noise processes. Unlike previous works that assume Gaussian or sub-Gaussian noise, we consider significantly broader noise distributions that are required to admit only up to the second moment. For this setting, we leverage tools from robust statistics to propose a novel system identification algorithm that exploits the idea of boosting. Despite the much weaker noise assumptions, we show that our proposed algorithm achieves sample complexity bounds that nearly match those derived under sub-Gaussian noise. In particular, we establish that our bounds retain a logarithmic dependence on the prescribed failure probability. Interestingly, we show that such bounds can be achieved by requiring just a finite fourth moment on the excitatory input process.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Equi-Euler GraphNet: An Equivariant, Temporal-Dynamics Informed Graph Neural Network for Dual Force and Trajectory Prediction in Multi-Body Systems
Authors:
Vinay Sharma,
Rémi Tanguy Oddon,
Pietro Tesini,
Jens Ravesloot,
Cees Taal,
Olga Fink
Abstract:
Accurate real-time modeling of multi-body dynamical systems is essential for enabling digital twin applications across industries. While many data-driven approaches aim to learn system dynamics, jointly predicting internal loads and system trajectories remains a key challenge. This dual prediction is especially important for fault detection and predictive maintenance, where internal loads-such as…
▽ More
Accurate real-time modeling of multi-body dynamical systems is essential for enabling digital twin applications across industries. While many data-driven approaches aim to learn system dynamics, jointly predicting internal loads and system trajectories remains a key challenge. This dual prediction is especially important for fault detection and predictive maintenance, where internal loads-such as contact forces-act as early indicators of faults, reflecting wear or misalignment before affecting motion. These forces also serve as inputs to degradation models (e.g., crack growth), enabling damage prediction and remaining useful life estimation. We propose Equi-Euler GraphNet, a physics-informed graph neural network (GNN) that simultaneously predicts internal forces and global trajectories in multi-body systems. In this mesh-free framework, nodes represent system components and edges encode interactions. Equi-Euler GraphNet introduces two inductive biases: (1) an equivariant message-passing scheme, interpreting edge messages as interaction forces consistent under Euclidean transformations; and (2) a temporal-aware iterative node update mechanism, based on Euler integration, to capture influence of distant interactions over time. Tailored for cylindrical roller bearings, it decouples ring dynamics from constrained motion of rolling elements. Trained on high-fidelity multiphysics simulations, Equi-Euler GraphNet generalizes beyond the training distribution, accurately predicting loads and trajectories under unseen speeds, loads, and configurations. It outperforms state-of-the-art GNNs focused on trajectory prediction, delivering stable rollouts over thousands of time steps with minimal error accumulation. Achieving up to a 200x speedup over conventional solvers while maintaining comparable accuracy, it serves as an efficient reduced-order model for digital twins, design, and maintenance.
△ Less
Submitted 25 April, 2025; v1 submitted 18 April, 2025;
originally announced April 2025.
-
In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?
Authors:
Ben Bucknall,
Saad Siddiqui,
Lara Thurnherr,
Conor McGurk,
Ben Harack,
Anka Reuel,
Patricia Paskov,
Casey Mahoney,
Sören Mindermann,
Scott Singer,
Vinay Hiremath,
Charbel-Raphaël Segerie,
Oscar Delaney,
Alessandro Abate,
Fazl Barez,
Michael K. Cohen,
Philip Torr,
Ferenc Huszár,
Anisoara Calinescu,
Gabriel Davis Jones,
Yoshua Bengio,
Robert Trager
Abstract:
International cooperation is common in AI research, including between geopolitical rivals. While many experts advocate for greater international cooperation on AI safety to address shared global risks, some view cooperation on AI with suspicion, arguing that it can pose unacceptable risks to national security. However, the extent to which cooperation on AI safety poses such risks, as well as provi…
▽ More
International cooperation is common in AI research, including between geopolitical rivals. While many experts advocate for greater international cooperation on AI safety to address shared global risks, some view cooperation on AI with suspicion, arguing that it can pose unacceptable risks to national security. However, the extent to which cooperation on AI safety poses such risks, as well as provides benefits, depends on the specific area of cooperation. In this paper, we consider technical factors that impact the risks of international cooperation on AI safety research, focusing on the degree to which such cooperation can advance dangerous capabilities, result in the sharing of sensitive information, or provide opportunities for harm. We begin by why nations historically cooperate on strategic technologies and analyse current US-China cooperation in AI as a case study. We further argue that existing frameworks for managing associated risks can be supplemented with consideration of key risks specific to cooperation on technical AI safety research. Through our analysis, we find that research into AI verification mechanisms and shared protocols may be suitable areas for such cooperation. Through this analysis we aim to help researchers and governments identify and mitigate the risks of international cooperation on AI safety research, so that the benefits of cooperation can be fully realised.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
WaterFlow: Learning Fast & Robust Watermarks using Stable Diffusion
Authors:
Vinay Shukla,
Prachee Sharma,
Ryan Rossi,
Sungchul Kim,
Tong Yu,
Aditya Grover
Abstract:
The ability to embed watermarks in images is a fundamental problem of interest for computer vision, and is exacerbated by the rapid rise of generated imagery in recent times. Current state-of-the-art techniques suffer from computational and statistical challenges such as the slow execution speed for practical deployments. In addition, other works trade off fast watermarking speeds but suffer great…
▽ More
The ability to embed watermarks in images is a fundamental problem of interest for computer vision, and is exacerbated by the rapid rise of generated imagery in recent times. Current state-of-the-art techniques suffer from computational and statistical challenges such as the slow execution speed for practical deployments. In addition, other works trade off fast watermarking speeds but suffer greatly in their robustness or perceptual quality. In this work, we propose WaterFlow (WF), a fast and extremely robust approach for high fidelity visual watermarking based on a learned latent-dependent watermark. Our approach utilizes a pretrained latent diffusion model to encode an arbitrary image into a latent space and produces a learned watermark that is then planted into the Fourier Domain of the latent. The transformation is specified via invertible flow layers that enhance the expressivity of the latent space of the pre-trained model to better preserve image quality while permitting robust and tractable detection. Most notably, WaterFlow demonstrates state-of-the-art performance on general robustness and is the first method capable of effectively defending against difficult combination attacks. We validate our findings on three widely used real and generated datasets: MS-COCO, DiffusionDB, and WikiArt.
△ Less
Submitted 17 April, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
Authors:
Diljeet Jagpal,
Xi Chen,
Vinay P. Namboodiri
Abstract:
Zero-shot, training-free, image-based text-to-video generation is an emerging area that aims to generate videos using existing image-based diffusion models. Current methods in this space require specific architectural changes to image generation models, which limit their adaptability and scalability. In contrast to such methods, we provide a model-agnostic approach. We use intersections in diffusi…
▽ More
Zero-shot, training-free, image-based text-to-video generation is an emerging area that aims to generate videos using existing image-based diffusion models. Current methods in this space require specific architectural changes to image generation models, which limit their adaptability and scalability. In contrast to such methods, we provide a model-agnostic approach. We use intersections in diffusion trajectories, working only with the latent values. We could not obtain localized frame-wise coherence and diversity using only the intersection of trajectories. Thus, we instead use a grid-based approach. An in-context trained LLM is used to generate coherent frame-wise prompts; another is used to identify differences between frames. Based on these, we obtain a CLIP-based attention mask that controls the timing of switching the prompts for each grid cell. Earlier switching results in higher variance, while later switching results in more coherence. Therefore, our approach can ensure appropriate control between coherence and variance for the frames. Our approach results in state-of-the-art performance while being more flexible when working with diverse image-generation models. The empirical analysis using quantitative metrics and user studies confirms our model's superior temporal consistency, visual fidelity and user satisfaction, thus providing a novel way to obtain training-free, image-based text-to-video generation.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Authors:
Yiming Zhang,
Harshita Diddee,
Susan Holm,
Hanchen Liu,
Xinyue Liu,
Vinay Samuel,
Barry Wang,
Daphne Ippolito
Abstract:
Language models have demonstrated remarkable capabilities on standard benchmarks, yet they struggle increasingly from mode collapse, the inability to generate diverse and novel outputs. Our work introduces NoveltyBench, a benchmark specifically designed to evaluate the ability of language models to produce multiple distinct and high-quality outputs. NoveltyBench utilizes prompts curated to elicit…
▽ More
Language models have demonstrated remarkable capabilities on standard benchmarks, yet they struggle increasingly from mode collapse, the inability to generate diverse and novel outputs. Our work introduces NoveltyBench, a benchmark specifically designed to evaluate the ability of language models to produce multiple distinct and high-quality outputs. NoveltyBench utilizes prompts curated to elicit diverse answers and filtered real-world user queries. Evaluating 20 leading language models, we find that current state-of-the-art systems generate significantly less diversity than human writers. Notably, larger models within a family often exhibit less diversity than their smaller counterparts, challenging the notion that capability on standard benchmarks translates directly to generative utility. While prompting strategies like in-context regeneration can elicit diversity, our findings highlight a fundamental lack of distributional diversity in current models, reducing their utility for users seeking varied responses and suggesting the need for new training and evaluation paradigms that prioritize diversity alongside quality.
△ Less
Submitted 24 May, 2025; v1 submitted 7 April, 2025;
originally announced April 2025.
-
LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
Authors:
Varsha Embar,
Ritvik Shrivastava,
Vinay Damodaran,
Travis Mehlinger,
Yu-Chung Hsiao,
Karthik Raghunathan
Abstract:
Large Language Models have transformed the Contact Center industry, manifesting in enhanced self-service tools, streamlined administrative processes, and augmented agent productivity. This paper delineates our system that automates call driver generation, which serves as the foundation for tasks such as topic modeling, incoming call classification, trend detection, and FAQ generation, delivering a…
▽ More
Large Language Models have transformed the Contact Center industry, manifesting in enhanced self-service tools, streamlined administrative processes, and augmented agent productivity. This paper delineates our system that automates call driver generation, which serves as the foundation for tasks such as topic modeling, incoming call classification, trend detection, and FAQ generation, delivering actionable insights for contact center agents and administrators to consume. We present a cost-efficient LLM system design, with 1) a comprehensive evaluation of proprietary, open-weight, and fine-tuned models and 2) cost-efficient strategies, and 3) the corresponding cost analysis when deployed in production environments.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
The CLEF-2025 CheckThat! Lab: Subjectivity, Fact-Checking, Claim Normalization, and Retrieval
Authors:
Firoj Alam,
Julia Maria Struß,
Tanmoy Chakraborty,
Stefan Dietze,
Salim Hafid,
Katerina Korre,
Arianna Muti,
Preslav Nakov,
Federico Ruggeri,
Sebastian Schellhammer,
Vinay Setty,
Megha Sundriyal,
Konstantin Todorov,
Venktesh V
Abstract:
The CheckThat! lab aims to advance the development of innovative technologies designed to identify and counteract online disinformation and manipulation efforts across various languages and platforms. The first five editions focused on key tasks in the information verification pipeline, including check-worthiness, evidence retrieval and pairing, and verification. Since the 2023 edition, the lab ha…
▽ More
The CheckThat! lab aims to advance the development of innovative technologies designed to identify and counteract online disinformation and manipulation efforts across various languages and platforms. The first five editions focused on key tasks in the information verification pipeline, including check-worthiness, evidence retrieval and pairing, and verification. Since the 2023 edition, the lab has expanded its scope to address auxiliary tasks that support research and decision-making in verification. In the 2025 edition, the lab revisits core verification tasks while also considering auxiliary challenges. Task 1 focuses on the identification of subjectivity (a follow-up from CheckThat! 2024), Task 2 addresses claim normalization, Task 3 targets fact-checking numerical claims, and Task 4 explores scientific web discourse processing. These tasks present challenging classification and retrieval problems at both the document and span levels, including multilingual settings.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning
Authors:
Soumya Banerjee,
Vinay Kumar Verma
Abstract:
Active learning aims to select optimal samples for labeling, minimizing annotation costs. This paper introduces a unified representation learning framework tailored for active learning with task awareness. It integrates diverse sources, comprising reconstruction, adversarial, self-supervised, knowledge-distillation, and classification losses into a unified VAE-based ADROIT approach. The proposed a…
▽ More
Active learning aims to select optimal samples for labeling, minimizing annotation costs. This paper introduces a unified representation learning framework tailored for active learning with task awareness. It integrates diverse sources, comprising reconstruction, adversarial, self-supervised, knowledge-distillation, and classification losses into a unified VAE-based ADROIT approach. The proposed approach comprises three key components - a unified representation generator (VAE), a state discriminator, and a (proxy) task-learner or classifier. ADROIT learns a latent code using both labeled and unlabeled data, incorporating task-awareness by leveraging labeled data with the proxy classifier. Unlike previous approaches, the proxy classifier additionally employs a self-supervised loss on unlabeled data and utilizes knowledge distillation to align with the target task-learner. The state discriminator distinguishes between labeled and unlabeled data, facilitating the selection of informative unlabeled samples. The dynamic interaction between VAE and the state discriminator creates a competitive environment, with the VAE attempting to deceive the discriminator, while the state discriminator learns to differentiate between labeled and unlabeled inputs. Extensive evaluations on diverse datasets and ablation analysis affirm the effectiveness of the proposed model.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering
Authors:
Vinay Kumar Verma,
Shreyas Sunil Kulkarni,
Happy Mittal,
Deepak Gupta
Abstract:
Question Answering (QA) and Visual Question Answering (VQA) are well-studied problems in the language and vision domain. One challenging scenario involves multiple sources of information, each of a different modality, where the answer to the question may exist in one or more sources. This scenario contains richer information but is highly complex to handle. In this work, we formulate a novel quest…
▽ More
Question Answering (QA) and Visual Question Answering (VQA) are well-studied problems in the language and vision domain. One challenging scenario involves multiple sources of information, each of a different modality, where the answer to the question may exist in one or more sources. This scenario contains richer information but is highly complex to handle. In this work, we formulate a novel question-answer generation (QAG) framework in an environment containing multi-source, multimodal information. The answer may belong to any or all sources; therefore, selecting the most prominent answer source or an optimal combination of all sources for a given question is challenging. To address this issue, we propose a question-guided attention mechanism that learns attention across multiple sources and decodes this information for robust and unbiased answer generation. To learn attention within each source, we introduce an explicit alignment between questions and various information sources, which facilitates identifying the most pertinent parts of the source information relative to the question. Scalability in handling diverse questions poses a challenge. We address this by extending our model to a sparse mixture-of-experts (sparse-MoE) framework, enabling it to handle thousands of question types. Experiments on T5 and Flan-T5 using three datasets demonstrate the model's efficacy, supported by ablation studies.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
Organize, Then Vote: Exploring Cognitive Load in Quadratic Survey Interfaces
Authors:
Ti-Chung Cheng,
Yutong Zhang,
Yi-Hung Chou,
Vinay Koshy,
Tiffany Wenting Li,
Karrie Karahalios,
Hari Sundaram
Abstract:
Quadratic Surveys (QSs) elicit more accurate preferences than traditional methods like Likert-scale surveys. However, the cognitive load associated with QSs has hindered their adoption in digital surveys for collective decision-making. We introduce a two-phase "organize-then-vote" QS to reduce cognitive load. As interface design significantly impacts survey results and accuracy, our design scaffol…
▽ More
Quadratic Surveys (QSs) elicit more accurate preferences than traditional methods like Likert-scale surveys. However, the cognitive load associated with QSs has hindered their adoption in digital surveys for collective decision-making. We introduce a two-phase "organize-then-vote" QS to reduce cognitive load. As interface design significantly impacts survey results and accuracy, our design scaffolds survey takers' decision-making while managing the cognitive load imposed by QS. In a 2x2 between-subject in-lab study on public resource allotment, we compared our interface with a traditional text interface across a QS with 6 (short) and 24 (long) options. Two-phase interface participants spent more time per option and exhibited shorter voting edit distances. We qualitatively observed shifts in cognitive effort from mechanical operations to constructing more comprehensive preferences. We conclude that this interface promoted deeper engagement, potentially reducing satisficing behaviors caused by cognitive overload in longer QSs. This research clarifies how human-centered design improves preference elicitation tools for collective decision-making.
△ Less
Submitted 16 May, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
Aerial Gym Simulator: A Framework for Highly Parallelized Simulation of Aerial Robots
Authors:
Mihir Kulkarni,
Welf Rehberg,
Kostas Alexis
Abstract:
This paper contributes the Aerial Gym Simulator, a highly parallelized, modular framework for simulation and rendering of arbitrary multirotor platforms based on NVIDIA Isaac Gym. Aerial Gym supports the simulation of under-, fully- and over-actuated multirotors offering parallelized geometric controllers, alongside a custom GPU-accelerated rendering framework for ray-casting capable of capturing…
▽ More
This paper contributes the Aerial Gym Simulator, a highly parallelized, modular framework for simulation and rendering of arbitrary multirotor platforms based on NVIDIA Isaac Gym. Aerial Gym supports the simulation of under-, fully- and over-actuated multirotors offering parallelized geometric controllers, alongside a custom GPU-accelerated rendering framework for ray-casting capable of capturing depth, segmentation and vertex-level annotations from the environment. Multiple examples for key tasks, such as depth-based navigation through reinforcement learning are provided. The comprehensive set of tools developed within the framework makes it a powerful resource for research on learning for control, planning, and navigation using state information as well as exteroceptive sensor observations. Extensive simulation studies are conducted and successful sim2real transfer of trained policies is demonstrated. The Aerial Gym Simulator is open-sourced at: https://github.com/ntnu-arl/aerial_gym_simulator.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems
Authors:
Hao Zhang,
Weiwei Li,
Rilin Chen,
Vinay Kothapally,
Meng Yu,
Dong Yu
Abstract:
Achieving full-duplex communication in spoken dialogue systems (SDS) requires real-time coordination between listening, speaking, and thinking. This paper proposes a semantic voice activity detection (VAD) module as a dialogue manager (DM) to efficiently manage turn-taking in full-duplex SDS. Implemented as a lightweight (0.5B) LLM fine-tuned on full-duplex conversation data, the semantic VAD pred…
▽ More
Achieving full-duplex communication in spoken dialogue systems (SDS) requires real-time coordination between listening, speaking, and thinking. This paper proposes a semantic voice activity detection (VAD) module as a dialogue manager (DM) to efficiently manage turn-taking in full-duplex SDS. Implemented as a lightweight (0.5B) LLM fine-tuned on full-duplex conversation data, the semantic VAD predicts four control tokens to regulate turn-switching and turn-keeping, distinguishing between intentional and unintentional barge-ins while detecting query completion for handling user pauses and hesitations. By processing input speech in short intervals, the semantic VAD enables real-time decision-making, while the core dialogue engine (CDE) is only activated for response generation, reducing computational overhead. This design allows independent DM optimization without retraining the CDE, balancing interaction accuracy and inference efficiency for scalable, next-generation full-duplex SDS.
△ Less
Submitted 24 February, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
Chronic Diseases Prediction Using ML
Authors:
Sri Varsha Mulakala,
G. Neeharika,
P. Vinay Kumar,
A. Bhargava Kiran
Abstract:
The recent increase in morbidity is primarily due to chronic diseases including Diabetes, Heart disease, Lung cancer, and brain tumours. The results for patients can be improved, and the financial burden on the healthcare system can be lessened, through the early detection and prevention of certain disorders. In this study, we built a machine-learning model for predicting the existence of numerous…
▽ More
The recent increase in morbidity is primarily due to chronic diseases including Diabetes, Heart disease, Lung cancer, and brain tumours. The results for patients can be improved, and the financial burden on the healthcare system can be lessened, through the early detection and prevention of certain disorders. In this study, we built a machine-learning model for predicting the existence of numerous diseases utilising datasets from various sources, including Kaggle, Dataworld, and the UCI repository, that are relevant to each of the diseases we intended to predict.
Following the acquisition of the datasets, we used feature engineering to extract pertinent features from the information, after which the model was trained on a training set and improved using a validation set. A test set was then used to assess the correctness of the final model. We provide an easy-to-use interface where users may enter the parameters for the selected ailment. Once the right model has been run, it will indicate whether the user has a certain ailment and offer suggestions for how to treat or prevent it.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
FactIR: A Real-World Zero-shot Open-Domain Retrieval Benchmark for Fact-Checking
Authors:
Venktesh V,
Vinay Setty
Abstract:
The field of automated fact-checking increasingly depends on retrieving web-based evidence to determine the veracity of claims in real-world scenarios. A significant challenge in this process is not only retrieving relevant information, but also identifying evidence that can both support and refute complex claims. Traditional retrieval methods may return documents that directly address claims or l…
▽ More
The field of automated fact-checking increasingly depends on retrieving web-based evidence to determine the veracity of claims in real-world scenarios. A significant challenge in this process is not only retrieving relevant information, but also identifying evidence that can both support and refute complex claims. Traditional retrieval methods may return documents that directly address claims or lean toward supporting them, but often struggle with more complex claims requiring indirect reasoning. While some existing benchmarks and methods target retrieval for fact-checking, a comprehensive real-world open-domain benchmark has been lacking. In this paper, we present a real-world retrieval benchmark FactIR, derived from Factiverse production logs, enhanced with human annotations. We rigorously evaluate state-of-the-art retrieval models in a zero-shot setup on FactIR and offer insights for developing practical retrieval systems for fact-checking. Code and data are available at https://github.com/factiverse/factIR.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
FlashCheck: Exploration of Efficient Evidence Retrieval for Fast Fact-Checking
Authors:
Kevin Nanekhan,
Venktesh V,
Erik Martin,
Henrik Vatndal,
Vinay Setty,
Avishek Anand
Abstract:
The advances in digital tools have led to the rampant spread of misinformation. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. It is essential for automated fact-checking to be efficient for aiding in combating misinformation in real-time and at the source. Fact-checking pipelines primarily comprise a knowledge retrieval component which extracts relev…
▽ More
The advances in digital tools have led to the rampant spread of misinformation. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. It is essential for automated fact-checking to be efficient for aiding in combating misinformation in real-time and at the source. Fact-checking pipelines primarily comprise a knowledge retrieval component which extracts relevant knowledge to fact-check a claim from large knowledge sources like Wikipedia and a verification component. The existing works primarily focus on the fact-verification part rather than evidence retrieval from large data collections, which often face scalability issues for practical applications such as live fact-checking. In this study, we address this gap by exploring various methods for indexing a succinct set of factual statements from large collections like Wikipedia to enhance the retrieval phase of the fact-checking pipeline. We also explore the impact of vector quantization to further improve the efficiency of pipelines that employ dense retrieval approaches for first-stage retrieval. We study the efficiency and effectiveness of the approaches on fact-checking datasets such as HoVer and WiCE, leveraging Wikipedia as the knowledge source. We also evaluate the real-world utility of the efficient retrieval approaches by fact-checking 2024 presidential debate and also open source the collection of claims with corresponding labels identified in the debate. Through a combination of indexed facts together with Dense retrieval and Index compression, we achieve up to a 10.0x speedup on CPUs and more than a 20.0x speedup on GPUs compared to the classical fact-checking pipelines over large collections.
△ Less
Submitted 16 February, 2025; v1 submitted 9 February, 2025;
originally announced February 2025.
-
Bridging the Gap in XAI-Why Reliable Metrics Matter for Explainability and Compliance
Authors:
Pratinav Seth,
Vinay Kumar Sankarapu
Abstract:
This position paper emphasizes the critical gap in the evaluation of Explainable AI (XAI) due to the lack of standardized and reliable metrics, which diminishes its practical value, trustworthiness, and ability to meet regulatory requirements. Current evaluation methods are often fragmented, subjective, and biased, making them prone to manipulation and complicating the assessment of complex models…
▽ More
This position paper emphasizes the critical gap in the evaluation of Explainable AI (XAI) due to the lack of standardized and reliable metrics, which diminishes its practical value, trustworthiness, and ability to meet regulatory requirements. Current evaluation methods are often fragmented, subjective, and biased, making them prone to manipulation and complicating the assessment of complex models. A central issue is the absence of a ground truth for explanations, complicating comparisons across various XAI approaches. To address these challenges, we advocate for widespread research into developing robust, context-sensitive evaluation metrics. These metrics should be resistant to manipulation, relevant to each use case, and based on human judgment and real-world applicability. We also recommend creating domain-specific evaluation benchmarks that align with the user and regulatory needs of sectors such as healthcare and finance. By encouraging collaboration among academia, industry, and regulators, we can create standards that balance flexibility and consistency, ensuring XAI explanations are meaningful, trustworthy, and compliant with evolving regulations.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods
Authors:
Pratinav Seth,
Yashwardhan Rathore,
Neeraj Kumar Singh,
Chintan Chitroda,
Vinay Kumar Sankarapu
Abstract:
The growing complexity of machine learning and deep learning models has led to an increased reliance on opaque "black box" systems, making it difficult to understand the rationale behind predictions. This lack of transparency is particularly challenging in high-stakes applications where interpretability is as important as accuracy. Post-hoc explanation methods are commonly used to interpret these…
▽ More
The growing complexity of machine learning and deep learning models has led to an increased reliance on opaque "black box" systems, making it difficult to understand the rationale behind predictions. This lack of transparency is particularly challenging in high-stakes applications where interpretability is as important as accuracy. Post-hoc explanation methods are commonly used to interpret these models, but they are seldom rigorously evaluated, raising concerns about their reliability. The Python package xai_evals addresses this by providing a comprehensive framework for generating, benchmarking, and evaluating explanation methods across both tabular and image data modalities. It integrates popular techniques like SHAP, LIME, Grad-CAM, Integrated Gradients (IG), and Backtrace, while supporting evaluation metrics such as faithfulness, sensitivity, and robustness. xai_evals enhances the interpretability of machine learning models, fostering transparency and trust in AI systems. The library is open-sourced at https://pypi.org/project/xai-evals/ .
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
Authors:
Shashank Sharma,
Janina Hoffmann,
Vinay Namboodiri
Abstract:
Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goal…
▽ More
Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goals into sequences of simpler subtasks, using a novel advantage estimation strategy that inherently rewards shorter plans and generalizes beyond training depths. In addition, to address the data efficiency challenge, we introduce an exploration strategy that generates targeted training examples for the planning modules without needing expert data. Experiments in 25-room navigation environments demonstrate $100\%$ success rate (vs $82\%$ baseline) and $73$-step average episode length (vs $158$-step baseline). The method also generalizes to momentum-based control tasks and requires only $\log N$ steps for replanning. Theoretical analysis and ablations validate our design choices.
△ Less
Submitted 27 May, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
Quantum Internet: Technologies, Protocols, and Research Challenges
Authors:
Vinay Kumar,
Claudio Cicconetti,
Marco Conti,
Andrea Passarella
Abstract:
As the field of the quantum internet advances, a comprehensive guide to navigate its complexities has become increasingly crucial. While quantum computing shares foundational principles with the quantum internet, distinguishing between the two is essential for further development and deeper understanding. This work systematically introduces the quantum internet by discussing its importance, core c…
▽ More
As the field of the quantum internet advances, a comprehensive guide to navigate its complexities has become increasingly crucial. While quantum computing shares foundational principles with the quantum internet, distinguishing between the two is essential for further development and deeper understanding. This work systematically introduces the quantum internet by discussing its importance, core components, operational mechanisms, anticipated timeline for viability, key contributors, major challenges, and future directions. Additionally, it presents the fundamental concepts of quantum mechanics that underpin the technology, offering a clear and targeted overview intended for researchers and industry professionals and laying the groundwork for future innovations and research in the field.
△ Less
Submitted 18 March, 2025; v1 submitted 30 January, 2025;
originally announced February 2025.
-
Annotation Tool and Dataset for Fact-Checking Podcasts
Authors:
Vinay Setty,
Adam James Becker
Abstract:
Podcasts are a popular medium on the web, featuring diverse and multilingual content that often includes unverified claims. Fact-checking podcasts is a challenging task, requiring transcription, annotation, and claim verification, all while preserving the contextual details of spoken content. Our tool offers a novel approach to tackle these challenges by enabling real-time annotation of podcasts d…
▽ More
Podcasts are a popular medium on the web, featuring diverse and multilingual content that often includes unverified claims. Fact-checking podcasts is a challenging task, requiring transcription, annotation, and claim verification, all while preserving the contextual details of spoken content. Our tool offers a novel approach to tackle these challenges by enabling real-time annotation of podcasts during playback. This unique capability allows users to listen to the podcast and annotate key elements, such as check-worthy claims, claim spans, and contextual errors, simultaneously. By integrating advanced transcription models like OpenAI's Whisper and leveraging crowdsourced annotations, we create high-quality datasets to fine-tune multilingual transformer models such as XLM-RoBERTa for tasks like claim detection and stance classification. Furthermore, we release the annotated podcast transcripts and sample annotations with preliminary experiments.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
FedAlign: Federated Domain Generalization with Cross-Client Feature Alignment
Authors:
Sunny Gupta,
Vinay Sutar,
Varunav Singh,
Amit Sethi
Abstract:
Federated Learning (FL) offers a decentralized paradigm for collaborative model training without direct data sharing, yet it poses unique challenges for Domain Generalization (DG), including strict privacy constraints, non-i.i.d. local data, and limited domain diversity. We introduce FedAlign, a lightweight, privacy-preserving framework designed to enhance DG in federated settings by simultaneousl…
▽ More
Federated Learning (FL) offers a decentralized paradigm for collaborative model training without direct data sharing, yet it poses unique challenges for Domain Generalization (DG), including strict privacy constraints, non-i.i.d. local data, and limited domain diversity. We introduce FedAlign, a lightweight, privacy-preserving framework designed to enhance DG in federated settings by simultaneously increasing feature diversity and promoting domain invariance. First, a cross-client feature extension module broadens local domain representations through domain-invariant feature perturbation and selective cross-client feature transfer, allowing each client to safely access a richer domain space. Second, a dual-stage alignment module refines global feature learning by aligning both feature embeddings and predictions across clients, thereby distilling robust, domain-invariant features. By integrating these modules, our method achieves superior generalization to unseen domains while maintaining data privacy and operating with minimal computational and communication overhead.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Are Traditional Deep Learning Model Approaches as Effective as a Retinal-Specific Foundation Model for Ocular and Systemic Disease Detection?
Authors:
Samantha Min Er Yew,
Xiaofeng Lei,
Jocelyn Hui Lin Goh,
Yibing Chen,
Sahana Srinivasan,
Miao-li Chee,
Krithi Pushpanathan,
Ke Zou,
Qingshan Hou,
Zhi Da Soh,
Cancan Xue,
Marco Chak Yan Yu,
Charumathi Sabanayagam,
E Shyong Tai,
Xueling Sim,
Yaxing Wang,
Jost B. Jonas,
Vinay Nangia,
Gabriel Dawei Yang,
Emma Anran Ran,
Carol Yim-Lui Cheung,
Yangqin Feng,
Jun Zhou,
Rick Siow Mong Goh,
Yukun Zhou
, et al. (4 additional authors not shown)
Abstract:
Background: RETFound, a self-supervised, retina-specific foundation model (FM), showed potential in downstream applications. However, its comparative performance with traditional deep learning (DL) models remains incompletely understood. This study aimed to evaluate RETFound against three ImageNet-pretrained supervised DL models (ResNet50, ViT-base, SwinV2) in detecting ocular and systemic disease…
▽ More
Background: RETFound, a self-supervised, retina-specific foundation model (FM), showed potential in downstream applications. However, its comparative performance with traditional deep learning (DL) models remains incompletely understood. This study aimed to evaluate RETFound against three ImageNet-pretrained supervised DL models (ResNet50, ViT-base, SwinV2) in detecting ocular and systemic diseases.
Methods: We fine-tuned/trained RETFound and three DL models on full datasets, 50%, 20%, and fixed sample sizes (400, 200, 100 images, with half comprising disease cases; for each DR severity class, 100 and 50 cases were used. Fine-tuned models were tested internally using the SEED (53,090 images) and APTOS-2019 (3,672 images) datasets and externally validated on population-based (BES, CIEMS, SP2, UKBB) and open-source datasets (ODIR-5k, PAPILA, GAMMA, IDRiD, MESSIDOR-2). Model performance was compared using area under the receiver operating characteristic curve (AUC) and Z-tests with Bonferroni correction (P<0.05/3).
Interpretation: Traditional DL models are mostly comparable to RETFound for ocular disease detection with large datasets. However, RETFound is superior in systemic disease detection with smaller datasets. These findings offer valuable insights into the respective merits and limitation of traditional models and FMs.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs
Authors:
Nilesh Dhulshette,
Sapan Shah,
Vinay Kulkarni
Abstract:
In large-scale software development, understanding the functionality and intent behind complex codebases is critical for effective development and maintenance. While code summarization has been widely studied, existing methods primarily focus on smaller code units, such as functions, and struggle with larger code artifacts like files and packages. Additionally, current summarization models tend to…
▽ More
In large-scale software development, understanding the functionality and intent behind complex codebases is critical for effective development and maintenance. While code summarization has been widely studied, existing methods primarily focus on smaller code units, such as functions, and struggle with larger code artifacts like files and packages. Additionally, current summarization models tend to emphasize low-level implementation details, often overlooking the domain and business context that are crucial for real-world applications. This paper proposes a two-step hierarchical approach for repository-level code summarization, tailored to business applications. First, smaller code units such as functions and variables are identified using syntax analysis and summarized with local LLMs. These summaries are then aggregated to generate higher-level file and package summaries. To ensure the summaries are grounded in business context, we design custom prompts that capture the intended purpose of code artifacts based on the domain and problem context of the business application. We evaluate our approach on a business support system (BSS) for the telecommunications domain, showing that syntax analysis-based hierarchical summarization improves coverage, while business-context grounding enhances the relevance of the generated summaries.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
Dynami-CAL GraphNet: A Physics-Informed Graph Neural Network Conserving Linear and Angular Momentum for Dynamical Systems
Authors:
Vinay Sharma,
Olga Fink
Abstract:
Accurate, interpretable, and real-time modeling of multi-body dynamical systems is essential for predicting behaviors and inferring physical properties in natural and engineered environments. Traditional physics-based models face scalability challenges and are computationally demanding, while data-driven approaches like Graph Neural Networks (GNNs) often lack physical consistency, interpretability…
▽ More
Accurate, interpretable, and real-time modeling of multi-body dynamical systems is essential for predicting behaviors and inferring physical properties in natural and engineered environments. Traditional physics-based models face scalability challenges and are computationally demanding, while data-driven approaches like Graph Neural Networks (GNNs) often lack physical consistency, interpretability, and generalization. In this paper, we propose Dynami-CAL GraphNet, a Physics-Informed Graph Neural Network that integrates the learning capabilities of GNNs with physics-based inductive biases to address these limitations. Dynami-CAL GraphNet enforces pairwise conservation of linear and angular momentum for interacting nodes using edge-local reference frames that are equivariant to rotational symmetries, invariant to translations, and equivariant to node permutations. This design ensures physically consistent predictions of node dynamics while offering interpretable, edge-wise linear and angular impulses resulting from pairwise interactions. Evaluated on a 3D granular system with inelastic collisions, Dynami-CAL GraphNet demonstrates stable error accumulation over extended rollouts, effective extrapolations to unseen configurations, and robust handling of heterogeneous interactions and external forces. Dynami-CAL GraphNet offers significant advantages in fields requiring accurate, interpretable, and real-time modeling of complex multi-body dynamical systems, such as robotics, aerospace engineering, and materials science. By providing physically consistent and scalable predictions that adhere to fundamental conservation laws, it enables the inference of forces and moments while efficiently handling heterogeneous interactions and external forces.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention
Authors:
Aadya Arora,
Vinay Namboodiri
Abstract:
With the popularity of foundational models, parameter efficient fine tuning has become the defacto approach to leverage pretrained models to perform downstream tasks. Taking inspiration from recent advances in large language models, Visual Prompt Tuning, and similar techniques, learn an additional prompt to efficiently finetune a pretrained vision foundational model. However, we observe that such…
▽ More
With the popularity of foundational models, parameter efficient fine tuning has become the defacto approach to leverage pretrained models to perform downstream tasks. Taking inspiration from recent advances in large language models, Visual Prompt Tuning, and similar techniques, learn an additional prompt to efficiently finetune a pretrained vision foundational model. However, we observe that such prompting is insufficient for fine-grained visual classification tasks such as medical image classification, where there is large inter-class variance, and small intra-class variance. Hence, in this paper we propose to leverage advanced segmentation capabilities of Segment Anything Model 2 (SAM2) as a visual prompting cue to help visual encoder in the CLIP (Contrastive Language-Image Pretraining) by guiding the attention in CLIP visual encoder to relevant regions in the image. This helps the model to focus on highly discriminative regions, without getting distracted from visually similar background features, an essential requirement in a fewshot, finegrained classification setting. We evaluate our method on diverse medical datasets including X-rays, CT scans, and MRI images, and report an accuracy of (71%, 81%, 86%, 58%) from the proposed approach on (COVID, lung-disease, brain-tumor, breast-cancer) datasets against (66%, 70%, 68%, 29%) from a pretrained CLIP model after fewshot training. The proposed approach also allows to obtain interpretable explanation for the classification performance through the localization obtained using segmentation.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Outlier-Robust Linear System Identification Under Heavy-tailed Noise
Authors:
Vinay Kanakeri,
Aritra Mitra
Abstract:
We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system, given access to multiple independent trajectories sampled from the system. Several recent papers have conducted a non-asymptotic analysis of this problem, relying crucially on the assumption that the process noise is either Gaussian or sub-Gaussian, i.e., "light-tailed". In sharp contrast, we…
▽ More
We consider the problem of estimating the state transition matrix of a linear time-invariant (LTI) system, given access to multiple independent trajectories sampled from the system. Several recent papers have conducted a non-asymptotic analysis of this problem, relying crucially on the assumption that the process noise is either Gaussian or sub-Gaussian, i.e., "light-tailed". In sharp contrast, we work under a significantly weaker noise model, assuming nothing more than the existence of the fourth moment of the noise distribution. For this setting, we provide the first set of results demonstrating that one can obtain sample-complexity bounds for linear system identification that are nearly of the same order as under sub-Gaussian noise. To achieve such results, we develop a novel robust system identification algorithm that relies on constructing multiple weakly-concentrated estimators, and then boosting their performance using suitable tools from high-dimensional robust statistics. Interestingly, our analysis reveals how the kurtosis of the noise distribution, a measure of heavy-tailedness, affects the number of trajectories needed to achieve desired estimation error bounds. Finally, we show that our algorithm and analysis technique can be easily extended to account for scenarios where an adversary can arbitrarily corrupt a small fraction of the collected trajectory data. Our work takes the first steps towards building a robust statistical learning theory for control under non-ideal assumptions on the data-generating process.
△ Less
Submitted 27 May, 2025; v1 submitted 31 December, 2024;
originally announced January 2025.
-
ObitoNet: Multimodal High-Resolution Point Cloud Reconstruction
Authors:
Apoorv Thapliyal,
Vinay Lanka,
Swathi Baskaran
Abstract:
ObitoNet employs a Cross Attention mechanism to integrate multimodal inputs, where Vision Transformers (ViT) extract semantic features from images and a point cloud tokenizer processes geometric information using Farthest Point Sampling (FPS) and K Nearest Neighbors (KNN) for spatial structure capture. The learned multimodal features are fed into a transformer-based decoder for high-resolution poi…
▽ More
ObitoNet employs a Cross Attention mechanism to integrate multimodal inputs, where Vision Transformers (ViT) extract semantic features from images and a point cloud tokenizer processes geometric information using Farthest Point Sampling (FPS) and K Nearest Neighbors (KNN) for spatial structure capture. The learned multimodal features are fed into a transformer-based decoder for high-resolution point cloud reconstruction. This approach leverages the complementary strengths of both modalities rich image features and precise geometric details ensuring robust point cloud generation even in challenging conditions such as sparse or noisy data.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
On the Feasibility of Vision-Language Models for Time-Series Classification
Authors:
Vinay Prithyani,
Mohsin Mohammed,
Richa Gadgil,
Ricardo Buitrago,
Vinija Jain,
Aman Chadha
Abstract:
We build upon time-series classification by leveraging the capabilities of Vision Language Models (VLMs). We find that VLMs produce competitive results after two or less epochs of fine-tuning. We develop a novel approach that incorporates graphical data representations as images in conjunction with numerical data. This approach is rooted in the hypothesis that graphical representations can provide…
▽ More
We build upon time-series classification by leveraging the capabilities of Vision Language Models (VLMs). We find that VLMs produce competitive results after two or less epochs of fine-tuning. We develop a novel approach that incorporates graphical data representations as images in conjunction with numerical data. This approach is rooted in the hypothesis that graphical representations can provide additional contextual information that numerical data alone may not capture. Additionally, providing a graphical representation can circumvent issues such as limited context length faced by LLMs. To further advance this work, we implemented a scalable end-to-end pipeline for training on different scenarios, allowing us to isolate the most effective strategies for transferring learning capabilities from LLMs to Time Series Classification (TSC) tasks. Our approach works with univariate and multivariate time-series data. In addition, we conduct extensive and practical experiments to show how this approach works for time-series classification and generative labels.
△ Less
Submitted 17 January, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Authors:
Yijia Shao,
Vinay Samuel,
Yucheng Jiang,
John Yang,
Diyi Yang
Abstract:
Recent advancements in language models (LMs) have sparked growing interest in developing LM agents. While fully autonomous agents could excel in many scenarios, numerous use cases inherently require them to collaborate with humans due to humans' latent preferences, domain expertise, or need for control. To facilitate the study of human-agent collaboration, we present Collaborative Gym (Co-Gym), a…
▽ More
Recent advancements in language models (LMs) have sparked growing interest in developing LM agents. While fully autonomous agents could excel in many scenarios, numerous use cases inherently require them to collaborate with humans due to humans' latent preferences, domain expertise, or need for control. To facilitate the study of human-agent collaboration, we present Collaborative Gym (Co-Gym), a general framework enabling asynchronous, tripartite interaction among agents, humans, and task environments. We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions, and propose an evaluation framework that assesses both the collaboration outcomes and processes. Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance within those delivered cases, achieving win rates of 86% in Travel Planning, 74% in Tabular Analysis, and 66% in Related Work when evaluated by real users. However, our study also highlights significant challenges in developing collaborative agents, requiring advancements in core aspects of intelligence -- communication capabilities, situational awareness, and balancing autonomy and human control.
△ Less
Submitted 16 January, 2025; v1 submitted 20 December, 2024;
originally announced December 2024.
-
GASP: Gaussian Avatars with Synthetic Priors
Authors:
Jack Saunders,
Charlie Hewitt,
Yanan Jian,
Marek Kowalski,
Tadas Baltrusaitis,
Yiye Chen,
Darren Cosker,
Virginia Estellers,
Nicholas Gyde,
Vinay P. Namboodiri,
Benjamin E Lundell
Abstract:
Gaussian Splatting has changed the game for real-time photo-realistic rendering. One of the most popular applications of Gaussian Splatting is to create animatable avatars, known as Gaussian Avatars. Recent works have pushed the boundaries of quality and rendering efficiency but suffer from two main limitations. Either they require expensive multi-camera rigs to produce avatars with free-view rend…
▽ More
Gaussian Splatting has changed the game for real-time photo-realistic rendering. One of the most popular applications of Gaussian Splatting is to create animatable avatars, known as Gaussian Avatars. Recent works have pushed the boundaries of quality and rendering efficiency but suffer from two main limitations. Either they require expensive multi-camera rigs to produce avatars with free-view rendering, or they can be trained with a single camera but only rendered at high quality from this fixed viewpoint. An ideal model would be trained using a short monocular video or image from available hardware, such as a webcam, and rendered from any view. To this end, we propose GASP: Gaussian Avatars with Synthetic Priors. To overcome the limitations of existing datasets, we exploit the pixel-perfect nature of synthetic data to train a Gaussian Avatar prior. By fitting this prior model to a single photo or video and fine-tuning it, we get a high-quality Gaussian Avatar, which supports 360$^\circ$ rendering. Our prior is only required for fitting, not inference, enabling real-time application. Through our method, we obtain high-quality, animatable Avatars from limited data which can be animated and rendered at 70fps on commercial hardware. See our project page (https://microsoft.github.io/GASP/) for results.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
SCADE: Scalable Framework for Anomaly Detection in High-Performance System
Authors:
Vaishali Vinay,
Anjali Mangal
Abstract:
As command-line interfaces remain integral to high-performance computing environments, the risk of exploitation through stealthy and complex command-line abuse grows. Conventional security solutions struggle to detect these anomalies due to their context-specific nature, lack of labeled data, and the prevalence of sophisticated attacks like Living-off-the-Land (LOL). To address this gap, we introd…
▽ More
As command-line interfaces remain integral to high-performance computing environments, the risk of exploitation through stealthy and complex command-line abuse grows. Conventional security solutions struggle to detect these anomalies due to their context-specific nature, lack of labeled data, and the prevalence of sophisticated attacks like Living-off-the-Land (LOL). To address this gap, we introduce the Scalable Command-Line Anomaly Detection Engine (SCADE), a framework that combines global statistical models with local context-specific analysis for unsupervised anomaly detection. SCADE leverages novel statistical methods, including BM25 and Log Entropy, alongside dynamic thresholding to adaptively detect rare, malicious command-line patterns in low signal-to-noise ratio (SNR) environments. Experimental results show that SCADE achieves above 98% SNR in identifying anomalous behavior while minimizing false positives. Designed for scalability and precision, SCADE provides an innovative, metadata-enriched approach to anomaly detection, offering a robust solution for cybersecurity in high-computation environments. This work presents SCADE's architecture, detection methodology, and its potential for enhancing anomaly detection in enterprise systems. We argue that SCADE represents a significant advancement in unsupervised anomaly detection, offering a robust, adaptive framework for security analysts and researchers seeking to enhance detection accuracy in high-computation environments.
△ Less
Submitted 9 December, 2024; v1 submitted 5 December, 2024;
originally announced December 2024.
-
Representation Learning for Time-Domain High-Energy Astrophysics: Discovery of Extragalactic Fast X-ray Transient XRT 200515
Authors:
Steven Dillmann,
Juan Rafael MartÃnez-Galarza,
Roberto Soria,
Rosanne Di Stefano,
Vinay L. Kashyap
Abstract:
We present a novel representation learning method for downstream tasks like anomaly detection, unsupervised classification, and similarity searches in high-energy data sets. This enabled the discovery of a new extragalactic fast X-ray transient (FXT) in Chandra archival data, XRT 200515, a needle-in-the-haystack event and the first Chandra FXT of its kind. Recent serendipitous discoveries in X-ray…
▽ More
We present a novel representation learning method for downstream tasks like anomaly detection, unsupervised classification, and similarity searches in high-energy data sets. This enabled the discovery of a new extragalactic fast X-ray transient (FXT) in Chandra archival data, XRT 200515, a needle-in-the-haystack event and the first Chandra FXT of its kind. Recent serendipitous discoveries in X-ray astronomy, including FXTs from binary neutron star mergers and an extragalactic planetary transit candidate, highlight the need for systematic transient searches in X-ray archives. We introduce new event file representations, E-t maps and E-t-dt cubes, that effectively encode both temporal and spectral information, enabling the seamless application of machine learning to variable-length event file time series. Our unsupervised learning approach employs PCA or sparse autoencoders to extract low-dimensional, informative features from these data representations, followed by clustering in the embedding space with DBSCAN. New transients are identified within transient-dominant clusters or through nearest-neighbour searches around known transients, producing a catalogue of 3559 candidates (3447 flares and 112 dips). XRT 200515 exhibits unique temporal and spectral variability, including an intense, hard <10s initial burst, followed by spectral softening in an ~800s oscillating tail. We interpret XRT 200515 as either the first giant magnetar flare observed at low X-ray energies or the first extragalactic Type I X-ray burst from a faint, previously unknown low-mass X-ray binary in the LMC. Our method extends to data sets from other observatories such as XMM-Newton, Swift-XRT, eROSITA, Einstein Probe, and upcoming missions like AXIS.
△ Less
Submitted 3 March, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models
Authors:
Vinay Kumar Sankarapu,
Chintan Chitroda,
Yashwardhan Rathore,
Neeraj Kumar Singh,
Pratinav Seth
Abstract:
The rapid growth of AI has led to more complex deep learning models, often operating as opaque "black boxes" with limited transparency in their decision-making. This lack of interpretability poses challenges, especially in high-stakes applications where understanding model output is crucial. This work highlights the importance of interpretability in fostering trust, accountability, and responsible…
▽ More
The rapid growth of AI has led to more complex deep learning models, often operating as opaque "black boxes" with limited transparency in their decision-making. This lack of interpretability poses challenges, especially in high-stakes applications where understanding model output is crucial. This work highlights the importance of interpretability in fostering trust, accountability, and responsible deployment. To address these challenges, we introduce DLBacktrace, a novel, model-agnostic technique designed to provide clear insights into deep learning model decisions across a wide range of domains and architectures, including MLPs, CNNs, and Transformer-based LLM models. We present a comprehensive overview of DLBacktrace and benchmark its performance against established interpretability methods such as SHAP, LIME, and GradCAM. Our results demonstrate that DLBacktrace effectively enhances understanding of model behavior across diverse tasks. DLBacktrace is compatible with models developed in both PyTorch and TensorFlow, supporting architectures such as BERT, ResNet, U-Net, and custom DNNs for tabular data. The library is open-sourced and available at https://github.com/AryaXAI/DLBacktrace .
△ Less
Submitted 4 February, 2025; v1 submitted 19 November, 2024;
originally announced November 2024.
-
Mitigating Gender Bias in Contextual Word Embeddings
Authors:
Navya Yarrabelly,
Vinay Damodaran,
Feng-Guang Su
Abstract:
Word embeddings have been shown to produce remarkable results in tackling a vast majority of NLP related tasks. Unfortunately, word embeddings also capture the stereotypical biases that are prevalent in society, affecting the predictive performance of the embeddings when used in downstream tasks. While various techniques have been proposed \cite{bolukbasi2016man, zhao2018learning} and criticized\c…
▽ More
Word embeddings have been shown to produce remarkable results in tackling a vast majority of NLP related tasks. Unfortunately, word embeddings also capture the stereotypical biases that are prevalent in society, affecting the predictive performance of the embeddings when used in downstream tasks. While various techniques have been proposed \cite{bolukbasi2016man, zhao2018learning} and criticized\cite{gonen2019lipstick} for static embeddings, very little work has focused on mitigating bias in contextual embeddings. In this paper, we propose a novel objective function for MLM(Masked-Language Modeling) which largely mitigates the gender bias in contextual embeddings and also preserves the performance for downstream tasks. Since previous works on measuring bias in contextual embeddings lack in normative reasoning, we also propose novel evaluation metrics that are straight-forward and aligned with our motivations in debiasing. We also propose new methods for debiasing static embeddings and provide empirical proof via extensive analysis and experiments, as to why the main source of bias in static embeddings stems from the presence of stereotypical names rather than gendered words themselves. All experiments and embeddings studied are in English, unless otherwise specified.\citep{bender2011achieving}.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.