-
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Authors:
Rizwan Qureshi,
Ranjan Sapkota,
Abbas Shah,
Amgad Muneer,
Anas Zafar,
Ashmal Vayani,
Maged Shoman,
Abdelrahman B. M. Eldaly,
Kai Zhang,
Ferhat Sadak,
Shaina Raza,
Xinqi Fan,
Ravid Shwartz-Ziv,
Hong Yan,
Vinjia Jain,
Aman Chadha,
Manoj Karkee,
Jia Wu,
Philip Torr,
Seyedali Mirjalili
Abstract:
Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level…
▽ More
Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Twill: Scheduling Compound AI Systems on Heterogeneous Mobile Edge Platforms
Authors:
Zain Taufique,
Aman Vyas,
Antonio Miele,
Pasi Liljeberg,
Anil Kanduri
Abstract:
Compound AI (cAI) systems chain multiple AI models to solve complex problems. cAI systems are typically composed of deep neural networks (DNNs), transformers, and large language models (LLMs), exhibiting a high degree of computational diversity and dynamic workload variation. Deploying cAI services on mobile edge platforms poses a significant challenge in scheduling concurrent DNN-transformer infe…
▽ More
Compound AI (cAI) systems chain multiple AI models to solve complex problems. cAI systems are typically composed of deep neural networks (DNNs), transformers, and large language models (LLMs), exhibiting a high degree of computational diversity and dynamic workload variation. Deploying cAI services on mobile edge platforms poses a significant challenge in scheduling concurrent DNN-transformer inference tasks, which arrive dynamically in an unknown sequence. Existing mobile edge AI inference strategies manage multi-DNN or transformer-only workloads, relying on design-time profiling, and cannot handle concurrent inference of DNNs and transformers required by cAI systems. In this work, we address the challenge of scheduling cAI systems on heterogeneous mobile edge platforms. We present Twill, a run-time framework to handle concurrent inference requests of cAI workloads through task affinity-aware cluster mapping and migration, priority-aware task freezing/unfreezing, and DVFS, while minimizing inference latency within power budgets. We implement and deploy our Twill framework on the Nvidia Jetson Orin NX platform. We evaluate Twill against state-of-the-art edge AI inference techniques over contemporary DNNs and LLMs, reducing inference latency by 54% on average, while honoring power budgets.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
On the Study of Weighted Fractional Cumulative Residual Inaccuracy and its Dynamical Version with Applications
Authors:
Aman Pandey,
Chanchal Kundu
Abstract:
In recent years, there has been a growing interest in information measures that quantify inaccuracy and uncertainty in systems. In this paper, we introduce a novel concept called the Weighted Fractional Cumulative Residual Inaccuracy (WFCRI). We develop several fundamental properties of WFCRI and establish important bounds that reveal its analytical behavior. Further, we examine the behavior of WF…
▽ More
In recent years, there has been a growing interest in information measures that quantify inaccuracy and uncertainty in systems. In this paper, we introduce a novel concept called the Weighted Fractional Cumulative Residual Inaccuracy (WFCRI). We develop several fundamental properties of WFCRI and establish important bounds that reveal its analytical behavior. Further, we examine the behavior of WFCRI under a mixture hazard model. A dynamic version of WFCRI also proposed and studied its behavior under proportional hazard rate model. An empirical estimation method for WFCRI under the proportional hazard rate model framework is also proposed, and its performance is evaluated through simulation studies. Finally, we demonstrate the utility of WFCRI measure in characterizing chaotic dynamics by applying it to the Ricker and cubic maps. The proposed measure is also applied to real data to assess the uncertainty.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images
Authors:
Shreyas Dixit,
Ashhar Aziz,
Shashwat Bajpai,
Vasu Sharma,
Aman Chadha,
Vinija Jain,
Amitava Das
Abstract:
A report by the European Union Law Enforcement Agency predicts that by 2026, up to 90 percent of online content could be synthetically generated, raising concerns among policymakers, who cautioned that "Generative AI could act as a force multiplier for political disinformation. The combined effect of generative text, images, videos, and audio may surpass the influence of any single modality." In r…
▽ More
A report by the European Union Law Enforcement Agency predicts that by 2026, up to 90 percent of online content could be synthetically generated, raising concerns among policymakers, who cautioned that "Generative AI could act as a force multiplier for political disinformation. The combined effect of generative text, images, videos, and audio may surpass the influence of any single modality." In response, California's Bill AB 3211 mandates the watermarking of AI-generated images, videos, and audio. However, concerns remain regarding the vulnerability of invisible watermarking techniques to tampering and the potential for malicious actors to bypass them entirely. Generative AI-powered de-watermarking attacks, especially the newly introduced visual paraphrase attack, have shown an ability to fully remove watermarks, resulting in a paraphrase of the original image. This paper introduces PECCAVI, the first visual paraphrase attack-safe and distortion-free image watermarking technique. In visual paraphrase attacks, an image is altered while preserving its core semantic regions, termed Non-Melting Points (NMPs). PECCAVI strategically embeds watermarks within these NMPs and employs multi-channel frequency domain watermarking. It also incorporates noisy burnishing to counter reverse-engineering efforts aimed at locating NMPs to disrupt the embedded watermark, thereby enhancing durability. PECCAVI is model-agnostic. All relevant resources and codes will be open-sourced.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
QuickSilver -- Speeding up LLM Inference through Dynamic Token Halting, KV Skipping, Contextual Token Fusion, and Adaptive Matryoshka Quantization
Authors:
Danush Khanna,
Aditya Kumar Guru,
Srivarshinee Sridhar,
Zidan Ahmed,
Rubhav Bahirwani,
Meetu Malhotra,
Vinija Jain,
Aman Chadha,
Amitava Das,
Kripabandhu Ghosh
Abstract:
Inference accounts for the majority of latency and energy consumption in large language model (LLM) deployments, often exceeding 90% of total cost. While training-time efficiency has seen extensive progress, runtime optimization remains a key bottleneck, particularly under autoregressive decoding. Existing approaches -- such as pruning, quantization, early exits, and speculative decoding -- often…
▽ More
Inference accounts for the majority of latency and energy consumption in large language model (LLM) deployments, often exceeding 90% of total cost. While training-time efficiency has seen extensive progress, runtime optimization remains a key bottleneck, particularly under autoregressive decoding. Existing approaches -- such as pruning, quantization, early exits, and speculative decoding -- often require retraining, architectural changes, or disrupt decoding compatibility. We introduce QuickSilver, a modular, token-level framework that enables semantic adaptivity at inference time without altering model weights or structure. QuickSilver integrates four synergistic mechanisms:
(i) Dynamic Token Halting, which halts computation for tokens with converged representations; (ii) KV Cache Skipping, which selectively suppresses memory writes to reduce attention overhead; and (iii) Contextual Token Fusion, which collapses redundant tokens into shared paths to shrink sequence length.
Unlike speculative decoding or MoE routing, QuickSilver operates entirely on frozen, dense models and requires no auxiliary networks. Applied to GPT-2 and Llama-2 across WikiText-103 and C4, QuickSilver achieves up to 39.6% FLOP reduction with negligible perplexity degradation (<=0.2).
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Fermionic S-matrix and cosmological correlators: T-violation at O(H)
Authors:
Aman Goyal,
Aneek Jana,
Swapnanil Mandal,
Aninda Sinha
Abstract:
We study the Bunch-Davies (BD) and Unruh-de Witt (UdW) de Sitter S-matrices in the presence of spin-$1/2$ fermions. Building on recent work, this enables us to correlate the de Sitter S-matrix with cosmological correlators. We consider a finite-time version of the UdW S-matrix to study $O(H)$ corrections to some typical particle physics processes such as beta decay. Owing to the lack of time-rever…
▽ More
We study the Bunch-Davies (BD) and Unruh-de Witt (UdW) de Sitter S-matrices in the presence of spin-$1/2$ fermions. Building on recent work, this enables us to correlate the de Sitter S-matrix with cosmological correlators. We consider a finite-time version of the UdW S-matrix to study $O(H)$ corrections to some typical particle physics processes such as beta decay. Owing to the lack of time-reversal symmetry in the expanding Poincaré patch, we find signatures of intrinsic T-violation in polarized beta decay. The observable we study begins at $O(H)$. The possibility of T-violation was examined theoretically in the 1950's by Jackson, Treiman, and Wyld in flat space and has been probed more recently in the emiT experiment, with the purpose of examining fundamental T-violation coming from additional interactions in the Lagrangian. Our analysis places a lower bound on the intrinsic T-violation in the expanding Poincaré patch. At $O(H)$, we find both energy conserving and energy non-conserving contributions. Surprisingly, the energy-violating piece, in principle, can give large T-violation at fine-tuned values of the kinematical variables.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Copula-Based Modeling of Fractional Inaccuracy: A Unified Framework
Authors:
Aman Pandey,
Chanchal Kundu
Abstract:
We introduce novel information-theoretic measures termed the multivariate cumulative copula fractional inaccuracy measure and the multivariate survival copula fractional inaccuracy measure, constructed respectively from multivariate copulas and multivariate survival copulas. These measures generalize the concept of fractional inaccuracy to multivariate settings by incorporating dependence structur…
▽ More
We introduce novel information-theoretic measures termed the multivariate cumulative copula fractional inaccuracy measure and the multivariate survival copula fractional inaccuracy measure, constructed respectively from multivariate copulas and multivariate survival copulas. These measures generalize the concept of fractional inaccuracy to multivariate settings by incorporating dependence structures through copulas. We establish bounds for these measures using the Frechet-Hoeffding bounds and investigate their behavior under lower and upper orthant stochastic orderings to facilitate comparative analysis. Furthermore, we define the multivariate co-copula fractional inaccuracy measure and the multivariate dual copula fractional inaccuracy measure, derived from the multivariate co-copula and dual copula, respectively, and examine several analogous properties for these extended forms.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs
Authors:
Janak Kapuriya,
Aman Singh,
Jainendra Shukla,
Rajiv Ratn Shah
Abstract:
Traditional mental health support systems often generate responses based solely on the user's current emotion and situations, resulting in superficial interventions that fail to address deeper emotional needs. This study introduces a novel framework by integrating spiritual wisdom from the Bhagavad Gita with advanced large language model GPT-4o to enhance emotional well-being. We present the GITes…
▽ More
Traditional mental health support systems often generate responses based solely on the user's current emotion and situations, resulting in superficial interventions that fail to address deeper emotional needs. This study introduces a novel framework by integrating spiritual wisdom from the Bhagavad Gita with advanced large language model GPT-4o to enhance emotional well-being. We present the GITes (Gita Integrated Therapy for Emotional Support) dataset, which enhances the existing ExTES mental health dataset by including 10,729 spiritually guided responses generated by GPT-4o and evaluated by domain experts. We benchmark GITes against 12 state-of-the-art LLMs, including both mental health specific and general purpose models. To evaluate spiritual relevance in generated responses beyond what conventional n-gram based metrics capture, we propose a novel Spiritual Insight metric and automate assessment via an LLM as jury framework using chain-of-thought prompting. Integrating spiritual guidance into AI driven support enhances both NLP and spiritual metrics for the best performing LLM Phi3-Mini 3.2B Instruct, achieving improvements of 122.71% in ROUGE, 126.53% in METEOR, 8.15% in BERT score, 15.92% in Spiritual Insight, 18.61% in Sufficiency and 13.22% in Relevance compared to its zero-shot counterpart. While these results reflect substantial improvements across automated empathy and spirituality metrics, further validation in real world patient populations remains a necessary step. Our findings indicate a strong potential for AI systems enriched with spiritual guidance to enhance user satisfaction and perceived support outcomes. The code and dataset will be publicly available to advance further research in this emerging area.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Deep CNN Face Matchers Inherently Support Revocable Biometric Templates
Authors:
Aman Bhatta,
Michael C. King,
Kevin W. Bowyer
Abstract:
One common critique of biometric authentication is that if an individual's biometric is compromised, then the individual has no recourse. The concept of revocable biometrics was developed to address this concern. A biometric scheme is revocable if an individual can have their current enrollment in the scheme revoked, so that the compromised biometric template becomes worthless, and the individual…
▽ More
One common critique of biometric authentication is that if an individual's biometric is compromised, then the individual has no recourse. The concept of revocable biometrics was developed to address this concern. A biometric scheme is revocable if an individual can have their current enrollment in the scheme revoked, so that the compromised biometric template becomes worthless, and the individual can re-enroll with a new template that has similar recognition power. We show that modern deep CNN face matchers inherently allow for a robust revocable biometric scheme. For a given state-of-the-art deep CNN backbone and training set, it is possible to generate an unlimited number of distinct face matcher models that have both (1) equivalent recognition power, and (2) strongly incompatible biometric templates. The equivalent recognition power extends to the point of generating impostor and genuine distributions that have the same shape and placement on the similarity dimension, meaning that the models can share a similarity threshold for a 1-in-10,000 false match rate. The biometric templates from different model instances are so strongly incompatible that the cross-instance similarity score for images of the same person is typically lower than the same-instance similarity score for images of different persons. That is, a stolen biometric template that is revoked is of less value in attempting to match the re-enrolled identity than the average impostor template. We also explore the feasibility of using a Vision Transformer (ViT) backbone-based face matcher in the revocable biometric system proposed in this work and demonstrate that it is less suitable compared to typical ResNet-based deep CNN backbones.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Sparse Feature Coactivation Reveals Composable Semantic Modules in Large Language Models
Authors:
Ruixuan Deng,
Xiaoyang Hu,
Miles Gilberti,
Shane Storks,
Aman Taxali,
Mike Angstadt,
Chandra Sripada,
Joyce Chai
Abstract:
We identify semantically coherent, context-consistent network components in large language models (LLMs) using coactivation of sparse autoencoder (SAE) features collected from just a handful of prompts. Focusing on country-relation tasks, we show that ablating semantic components for countries and relations changes model outputs in predictable ways, while amplifying these components induces counte…
▽ More
We identify semantically coherent, context-consistent network components in large language models (LLMs) using coactivation of sparse autoencoder (SAE) features collected from just a handful of prompts. Focusing on country-relation tasks, we show that ablating semantic components for countries and relations changes model outputs in predictable ways, while amplifying these components induces counterfactual responses. Notably, composing relation and country components yields compound counterfactual outputs. We find that, whereas most country components emerge from the very first layer, the more abstract relation components are concentrated in later layers. Furthermore, within relation components themselves, nodes from later layers tend to have a stronger causal impact on model outputs. Overall, these findings suggest a modular organization of knowledge within LLMs and advance methods for efficient, targeted model manipulation.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
Mental Health Equity in LLMs: Leveraging Multi-Hop Question Answering to Detect Amplified and Silenced Perspectives
Authors:
Batool Haider,
Atmika Gorti,
Aman Chadha,
Manas Gaur
Abstract:
Large Language Models (LLMs) in mental healthcare risk propagating biases that reinforce stigma and harm marginalized groups. While previous research identified concerning trends, systematic methods for detecting intersectional biases remain limited. This work introduces a multi-hop question answering (MHQA) framework to explore LLM response biases in mental health discourse. We analyze content fr…
▽ More
Large Language Models (LLMs) in mental healthcare risk propagating biases that reinforce stigma and harm marginalized groups. While previous research identified concerning trends, systematic methods for detecting intersectional biases remain limited. This work introduces a multi-hop question answering (MHQA) framework to explore LLM response biases in mental health discourse. We analyze content from the Interpretable Mental Health Instruction (IMHI) dataset across symptom presentation, coping mechanisms, and treatment approaches. Using systematic tagging across age, race, gender, and socioeconomic status, we investigate bias patterns at demographic intersections. We evaluate four LLMs: Claude 3.5 Sonnet, Jamba 1.6, Gemma 3, and Llama 4, revealing systematic disparities across sentiment, demographics, and mental health conditions. Our MHQA approach demonstrates superior detection compared to conventional methods, identifying amplification points where biases magnify through sequential reasoning. We implement two debiasing techniques: Roleplay Simulation and Explicit Bias Reduction, achieving 66-94% bias reductions through few-shot prompting with BBQ dataset examples. These findings highlight critical areas where LLMs reproduce mental healthcare biases, providing actionable insights for equitable AI development.
△ Less
Submitted 22 June, 2025;
originally announced June 2025.
-
The Indian Pulsar Timing Array Data Release 2: I. Dataset and Timing Analysis
Authors:
Prerna Rana,
Pratik Tarafdar,
Nobleson K,
Churchil Dwivedi,
Bhal Chandra Joshi,
Debabrata Deb,
Sushovan Mondal,
M. A. Krishnakumar,
Adya Shukla,
Jaikhomba Singha,
Himanshu Grover,
Hemanga Tahbildar,
Abhimanyu Susobhanan,
Mayuresh Surnis,
Shantanu Desai,
Neelam Dhanda Batra,
Aman Srivastava,
Vinay Bharambe,
Jibin Jose,
Vaishnavi Vyasraj,
Shebin Jose Jacob,
Amarnath,
Manpreet Singh,
Zenia Zuraiq,
Sarbartha Sengupta
, et al. (22 additional authors not shown)
Abstract:
The Indian Pulsar Timing Array (InPTA) employs unique features of the upgraded Giant Metrewave Radio Telescope (uGMRT) to monitor dozens of the International Pulsar Timing Array (IPTA) millisecond pulsars (MSPs), simultaneously in the 300-500 MHz and the 1260-1460 MHz bands. This dual-band approach ensures that any frequency-dependent delays are accurately characterized, significantly improving th…
▽ More
The Indian Pulsar Timing Array (InPTA) employs unique features of the upgraded Giant Metrewave Radio Telescope (uGMRT) to monitor dozens of the International Pulsar Timing Array (IPTA) millisecond pulsars (MSPs), simultaneously in the 300-500 MHz and the 1260-1460 MHz bands. This dual-band approach ensures that any frequency-dependent delays are accurately characterized, significantly improving the timing precision for pulsar observations, which is crucial for pulsar timing arrays. We present details of InPTA's second data release that involves 7 yrs of data on 27 IPTA MSPs. This includes sub-banded Times of Arrival (ToAs), Dispersion Measures (DM), and initial timing ephemerides for our MSPs. A part of this dataset, originally released in InPTA's first data release, is being incorporated into IPTA's third data release which is expected to detect and characterize nanohertz gravitational waves in the coming years. The entire dataset is reprocessed in this second data release providing some of the highest precision DM estimates so far and interesting solar wind related DM variations in some pulsars. This is likely to characterize the noise introduced by the dynamic inter-stellar ionised medium much better than the previous release thereby increasing sensitivity to any future gravitational wave search.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
NepaliGPT: A Generative Language Model for the Nepali Language
Authors:
Shushanta Pudasaini,
Aman Shakya,
Siddhartha Shrestha,
Sahil Bhatta,
Sunil Thapa,
Sushmita Palikhe
Abstract:
After the release of ChatGPT, Large Language Models (LLMs) have gained huge popularity in recent days and thousands of variants of LLMs have been released. However, there is no generative language model for the Nepali language, due to which other downstream tasks, including fine-tuning, have not been explored yet. To fill this research gap in the Nepali NLP space, this research proposes \textit{Ne…
▽ More
After the release of ChatGPT, Large Language Models (LLMs) have gained huge popularity in recent days and thousands of variants of LLMs have been released. However, there is no generative language model for the Nepali language, due to which other downstream tasks, including fine-tuning, have not been explored yet. To fill this research gap in the Nepali NLP space, this research proposes \textit{NepaliGPT}, a generative large language model tailored specifically for the Nepali language. This research introduces an advanced corpus for the Nepali language collected from several sources, called the Devanagari Corpus. Likewise, the research introduces the first NepaliGPT benchmark dataset comprised of 4,296 question-answer pairs in the Nepali language. The proposed LLM NepaliGPT achieves the following metrics in text generation: Perplexity of 26.32245, ROUGE-1 score of 0.2604, causal coherence of 81.25\%, and causal consistency of 85.41\%.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Comparison between External and Internal Single Stage Planetary gearbox actuators for legged robots
Authors:
Aman Singh,
Deepak Kapa,
Prasham Chedda,
Shishir N. Y. Kolathaya
Abstract:
Legged robots, such as quadrupeds and humanoids, require high-performance actuators for efficient locomotion. Quasi-Direct-Drive (QDD) actuators with single-stage planetary gearboxes offer low inertia, high efficiency, and transparency. Among planetary gearbox architectures, Internal (ISSPG) and External Single-Stage Planetary Gearbox (ESSPG) are the two predominant designs. While ISSPG is often p…
▽ More
Legged robots, such as quadrupeds and humanoids, require high-performance actuators for efficient locomotion. Quasi-Direct-Drive (QDD) actuators with single-stage planetary gearboxes offer low inertia, high efficiency, and transparency. Among planetary gearbox architectures, Internal (ISSPG) and External Single-Stage Planetary Gearbox (ESSPG) are the two predominant designs. While ISSPG is often preferred for its compactness and high torque density at certain gear ratios, no objective comparison between the two architectures exists. Additionally, existing designs rely on heuristics rather than systematic optimization. This paper presents a design framework for optimally selecting actuator parameters based on given performance requirements and motor specifications. Using this framework, we generate and analyze various optimized gearbox designs for both architectures. Our results demonstrate that for the T-motor U12, ISSPG is the superior choice within the lower gear ratio range of 5:1 to 7:1, offering a lighter design. However, for gear ratios exceeding 7:1, ISSPG becomes infeasible, making ESSPG the better option in the 7:1 to 11:1 range. To validate our approach, we designed and optimized two actuators for manufacturing: an ISSPG with a 6.0:1 gear ratio and an ESSPG with a 7.2:1 gear ratio. Their respective masses closely align with our optimization model predictions, confirming the effectiveness of our methodology.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
FLARE: FCCee b2Luigi Automated Reconstruction And Event processing
Authors:
Cameron Harris,
Aman Desai
Abstract:
FLARE is an open source data workflow orchestration tool designed for the FCC Analysis software and Key4HEP stack. Powered by b2luigi, FLARE automates and orchestrates the fccanalysis stages from start to finish. Furthermore, FLARE is capable of managing the Monte Carlo (MC) data workflow using generators inside the Key4HEP stack such as Whizard, MadGraph5 aMC@NLO, Pythia8 and Delphes. In this pap…
▽ More
FLARE is an open source data workflow orchestration tool designed for the FCC Analysis software and Key4HEP stack. Powered by b2luigi, FLARE automates and orchestrates the fccanalysis stages from start to finish. Furthermore, FLARE is capable of managing the Monte Carlo (MC) data workflow using generators inside the Key4HEP stack such as Whizard, MadGraph5 aMC@NLO, Pythia8 and Delphes. In this paper the FLARE v0.1.4 package will be explored along with its extensible capabilities and a feature rich work environment. Examples of FLARE will be discussed in a variety of use-cases, all of which can be found at https://github.com/CamCoop1/FLARE-examples. The open source repository of FLARE can be found at https://github.com/CamCoop1/FLARE
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents
Authors:
Mohammad Amaan Sayeed,
Mohammed Talha Alam,
Raza Imam,
Shahab Saquib Sohail,
Amir Hussain
Abstract:
Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at…
▽ More
Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at scale. We propose a unified evaluation pipeline, Tibbe-AG, that aligns 30 carefully curated Prophetic-medicine questions with human-verified remedies and compares three LLMs (LLaMA-3, Mistral-7B, Qwen2-7B) under three configurations: direct generation, retrieval-augmented generation, and a scientific self-critique filter. Each answer is then assessed by a secondary LLM serving as an agentic judge, yielding a single 3C3H quality score. Retrieval improves factual accuracy by 13%, while the agentic prompt adds another 10% improvement through deeper mechanistic insight and safety considerations. Our results demonstrate that blending classical Islamic texts with retrieval and self-evaluation enables reliable, culturally sensitive medical question-answering.
△ Less
Submitted 22 June, 2025; v1 submitted 18 June, 2025;
originally announced June 2025.
-
InsertRank: LLMs can reason over BM25 scores to Improve Listwise Reranking
Authors:
Rahul Seetharaman,
Kaustubh D. Dhole,
Aman Bansal
Abstract:
Large Language Models (LLMs) have demonstrated significant strides across various information retrieval tasks, particularly as rerankers, owing to their strong generalization and knowledge-transfer capabilities acquired from extensive pretraining. In parallel, the rise of LLM-based chat interfaces has raised user expectations, encouraging users to pose more complex queries that necessitate retriev…
▽ More
Large Language Models (LLMs) have demonstrated significant strides across various information retrieval tasks, particularly as rerankers, owing to their strong generalization and knowledge-transfer capabilities acquired from extensive pretraining. In parallel, the rise of LLM-based chat interfaces has raised user expectations, encouraging users to pose more complex queries that necessitate retrieval by ``reasoning'' over documents rather than through simple keyword matching or semantic similarity. While some recent efforts have exploited reasoning abilities of LLMs for reranking such queries, considerable potential for improvement remains. In that regards, we introduce InsertRank, an LLM-based reranker that leverages lexical signals like BM25 scores during reranking to further improve retrieval performance. InsertRank demonstrates improved retrieval effectiveness on -- BRIGHT, a reasoning benchmark spanning 12 diverse domains, and R2MED, a specialized medical reasoning retrieval benchmark spanning 8 different tasks. We conduct an exhaustive evaluation and several ablation studies and demonstrate that InsertRank consistently improves retrieval effectiveness across multiple families of LLMs, including GPT, Gemini, and Deepseek models. %In addition, we also conduct ablation studies on normalization by varying the scale of the BM25 scores, and positional bias by shuffling the order of the documents. With Deepseek-R1, InsertRank achieves a score of 37.5 on the BRIGHT benchmark. and 51.1 on the R2MED benchmark, surpassing previous methods.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations
Authors:
Abhilekh Borah,
Chhavi Sharma,
Danush Khanna,
Utkarsh Bhatt,
Gurpreet Singh,
Hasnat Md Abdullah,
Raghav Kaushik Ravi,
Vinija Jain,
Jyoti Patel,
Shubham Singh,
Vasu Sharma,
Arpita Vats,
Rahul Raja,
Aman Chadha,
Amitava Das
Abstract:
Alignment is no longer a luxury, it is a necessity. As large language models (LLMs) enter high-stakes domains like education, healthcare, governance, and law, their behavior must reliably reflect human-aligned values and safety constraints. Yet current evaluations rely heavily on behavioral proxies such as refusal rates, G-Eval scores, and toxicity classifiers, all of which have critical blind spo…
▽ More
Alignment is no longer a luxury, it is a necessity. As large language models (LLMs) enter high-stakes domains like education, healthcare, governance, and law, their behavior must reliably reflect human-aligned values and safety constraints. Yet current evaluations rely heavily on behavioral proxies such as refusal rates, G-Eval scores, and toxicity classifiers, all of which have critical blind spots. Aligned models are often vulnerable to jailbreaking, stochasticity of generation, and alignment faking.
To address this issue, we introduce the Alignment Quality Index (AQI). This novel geometric and prompt-invariant metric empirically assesses LLM alignment by analyzing the separation of safe and unsafe activations in latent space. By combining measures such as the Davies-Bouldin Score (DBS), Dunn Index (DI), Xie-Beni Index (XBI), and Calinski-Harabasz Index (CHI) across various formulations, AQI captures clustering quality to detect hidden misalignments and jailbreak risks, even when outputs appear compliant. AQI also serves as an early warning signal for alignment faking, offering a robust, decoding invariant tool for behavior agnostic safety auditing.
Additionally, we propose the LITMUS dataset to facilitate robust evaluation under these challenging conditions. Empirical tests on LITMUS across different models trained under DPO, GRPO, and RLHF conditions demonstrate AQI's correlation with external judges and ability to reveal vulnerabilities missed by refusal metrics. We make our implementation publicly available to foster future research in this area.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Excitations and dynamical structure factor of $J_1-J_2$ spin-$3/2$ and spin-$5/2$ Heisenberg spin chains
Authors:
Aman Sharma,
Mithilesh Nayak,
Natalia Chepiga,
Frédéric Mila
Abstract:
We study the dynamical structure factor of the frustrated spin-$3/2$ $J_1$-$J_2$ Heisenberg chains, with particular focus on the partially dimerized phase that emerges between two Kosterlitz-Thouless transitions. Using a valence bond solid ansatz corroborated by density matrix renormalization group simulations, we investigate the nature of magnon and spinon excitations through the single-mode appr…
▽ More
We study the dynamical structure factor of the frustrated spin-$3/2$ $J_1$-$J_2$ Heisenberg chains, with particular focus on the partially dimerized phase that emerges between two Kosterlitz-Thouless transitions. Using a valence bond solid ansatz corroborated by density matrix renormalization group simulations, we investigate the nature of magnon and spinon excitations through the single-mode approximation. We show that the magnon develops an incommensurate dispersion at $J_2 \approx 0.32J_1$, while the spinons, viewed as domain walls between degenerate valence bond solid states, become incommensurate at $J_2 \approx 0.4J_1$ beyond the Lifshitz point ($J_2 \approx 0.388J_1$). The dynamical structure factor exhibits rich spectral features shaped by the interplay between these excitations, with magnons appearing as resonances embedded in the spinon continuum. The spinon gap shows a nonmonotonic behavior, reaching a peak near the center of the partially dimerized phase and closing at the boundaries, suggesting the appearance of a floating phase as a result of the condensation of incommensurate spinons. Comparative analysis with the spin-$5/2$ case confirms the universality of these phenomena across half-integer higher-spin systems. Our results provide detailed insight into how fractionalization and incommensurate condensation govern the spectral properties of frustrated spin chains, offering a unified picture across different spin magnitudes.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
ImmunoFOMO: Are Language Models missing what oncologists see?
Authors:
Aman Sinha,
Bogdan-Valentin Popescu,
Xavier Coubez,
Marianne Clausel,
Mathieu Constant
Abstract:
Language models (LMs) capabilities have grown with a fast pace over the past decade leading researchers in various disciplines, such as biomedical research, to increasingly explore the utility of LMs in their day-to-day applications. Domain specific language models have already been in use for biomedical natural language processing (NLP) applications. Recently however, the interest has grown towar…
▽ More
Language models (LMs) capabilities have grown with a fast pace over the past decade leading researchers in various disciplines, such as biomedical research, to increasingly explore the utility of LMs in their day-to-day applications. Domain specific language models have already been in use for biomedical natural language processing (NLP) applications. Recently however, the interest has grown towards medical language models and their understanding capabilities. In this paper, we investigate the medical conceptual grounding of various language models against expert clinicians for identification of hallmarks of immunotherapy in breast cancer abstracts. Our results show that pre-trained language models have potential to outperform large language models in identifying very specific (low-level) concepts.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
CarbonSet: A Dataset to Analyze Trends and Benchmark the Sustainability of CPUs and GPUs
Authors:
Jiajun Hu,
Chetan Choppali Sudarshan,
Vidya A. Chhabria,
Aman Arora
Abstract:
Over the years, the chip industry has consistently developed high-performance processors to address the increasing demands across diverse applications. However, the rapid expansion of chip production has significantly increased carbon emissions, raising critical concerns about environmental sustainability. While researchers have previously modeled the carbon footprint (CFP) at both system and proc…
▽ More
Over the years, the chip industry has consistently developed high-performance processors to address the increasing demands across diverse applications. However, the rapid expansion of chip production has significantly increased carbon emissions, raising critical concerns about environmental sustainability. While researchers have previously modeled the carbon footprint (CFP) at both system and processor levels, a holistic analysis of sustainability trends encompassing the entire chip lifecycle remains lacking. This paper presents CarbonSet, a comprehensive dataset integrating sustainability and performance metrics for CPUs and GPUs over the past decade. CarbonSet aims to benchmark and assess the design of next-generation processors. Leveraging this dataset, we conducted detailed analysis of flagship processors' sustainability trends over the last decade. This paper further highlights that modern processors are not yet sustainably designed, with total carbon emissions increasing more than 50$\times$ in the past three years due to the surging demand driven by the AI boom. Power efficiency remains a significant concern, while advanced process nodes pose new challenges requiring to effectively amortize the dramatically increased manufacturing carbon emissions.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)
Authors:
Danush Khanna,
Krishna Kumar,
Basab Ghosh,
Vinija Jain,
Vasu Sharma,
Aman Chadha,
Amitava Das
Abstract:
Adversarial threats against LLMs are escalating faster than current defenses can adapt. We expose a critical geometric blind spot in alignment: adversarial prompts exploit latent camouflage, embedding perilously close to the safe representation manifold while encoding unsafe intent thereby evading surface level defenses like Direct Preference Optimization (DPO), which remain blind to the latent ge…
▽ More
Adversarial threats against LLMs are escalating faster than current defenses can adapt. We expose a critical geometric blind spot in alignment: adversarial prompts exploit latent camouflage, embedding perilously close to the safe representation manifold while encoding unsafe intent thereby evading surface level defenses like Direct Preference Optimization (DPO), which remain blind to the latent geometry. We introduce ALKALI, the first rigorously curated adversarial benchmark and the most comprehensive to date spanning 9,000 prompts across three macro categories, six subtypes, and fifteen attack families. Evaluation of 21 leading LLMs reveals alarmingly high Attack Success Rates (ASRs) across both open and closed source models, exposing an underlying vulnerability we term latent camouflage, a structural blind spot where adversarial completions mimic the latent geometry of safe ones. To mitigate this vulnerability, we introduce GRACE - Geometric Representation Aware Contrastive Enhancement, an alignment framework coupling preference learning with latent space regularization. GRACE enforces two constraints: latent separation between safe and adversarial completions, and adversarial cohesion among unsafe and jailbreak behaviors. These operate over layerwise pooled embeddings guided by a learned attention profile, reshaping internal geometry without modifying the base model, and achieve up to 39% ASR reduction. Moreover, we introduce AVQI, a geometry aware metric that quantifies latent alignment failure via cluster separation and compactness. AVQI reveals when unsafe completions mimic the geometry of safe ones, offering a principled lens into how models internally encode safety. We make the code publicly available at https://anonymous.4open.science/r/alkali-B416/README.md.
△ Less
Submitted 11 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Neural and Cognitive Impacts of AI: The Influence of Task Subjectivity on Human-LLM Collaboration
Authors:
Matthew Russell,
Aman Shah,
Giles Blaney,
Judith Amores,
Mary Czerwinski,
Robert J. K. Jacob
Abstract:
AI-based interactive assistants are advancing human-augmenting technology, yet their effects on users' mental and physiological states remain under-explored. We address this gap by analyzing how Copilot for Microsoft Word, a LLM-based assistant, impacts users. Using tasks ranging from objective (SAT reading comprehension) to subjective (personal reflection), and with measurements including fNIRS,…
▽ More
AI-based interactive assistants are advancing human-augmenting technology, yet their effects on users' mental and physiological states remain under-explored. We address this gap by analyzing how Copilot for Microsoft Word, a LLM-based assistant, impacts users. Using tasks ranging from objective (SAT reading comprehension) to subjective (personal reflection), and with measurements including fNIRS, Empatica E4, NASA-TLX, and questionnaires, we measure Copilot's effects on users. We also evaluate users' performance with and without Copilot across tasks. In objective tasks, participants reported a reduction of workload and an increase in enjoyment, which was paired with objective performance increases. Participants reported reduced workload and increased enjoyment with no change in performance in a creative poetry writing task. However, no benefits due to Copilot use were reported in a highly subjective self-reflection task. Although no physiological changes were recorded due to Copilot use, task-dependent differences in prefrontal cortex activation offer complementary insights into the cognitive processes associated with successful and unsuccessful human-AI collaboration. These findings suggest that AI assistants' effectiveness varies with task type-particularly showing decreased usefulness in tasks that engage episodic memory-and presents a brain-network based hypothesis of human-AI collaboration.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Multilingual Information Retrieval with a Monolingual Knowledge Base
Authors:
Yingying Zhuang,
Aman Gupta,
Anurag Beniwal
Abstract:
Multilingual information retrieval has emerged as powerful tools for expanding knowledge sharing across languages. On the other hand, resources on high quality knowledge base are often scarce and in limited languages, therefore an effective embedding model to transform sentences from different languages into a feature vector space same as the knowledge base language becomes the key ingredient for…
▽ More
Multilingual information retrieval has emerged as powerful tools for expanding knowledge sharing across languages. On the other hand, resources on high quality knowledge base are often scarce and in limited languages, therefore an effective embedding model to transform sentences from different languages into a feature vector space same as the knowledge base language becomes the key ingredient for cross language knowledge sharing, especially to transfer knowledge available in high-resource languages to low-resource ones. In this paper we propose a novel strategy to fine-tune multilingual embedding models with weighted sampling for contrastive learning, enabling multilingual information retrieval with a monolingual knowledge base. We demonstrate that the weighted sampling strategy produces performance gains compared to standard ones by up to 31.03\% in MRR and up to 33.98\% in Recall@3. Additionally, our proposed methodology is language agnostic and applicable for both multilingual and code switching use cases.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Ultrahigh-Q Torsional Nanomechanics through Bayesian Optimization
Authors:
Atkin D. Hyatt,
Aman R. Agrawal,
Christian M. Pluchar,
Charles A. Condos,
Dalziel J. Wilson
Abstract:
Recently it was discovered that torsion modes of strained nanoribbons exhibit dissipation dilution, giving a route to enhanced torque sensing and quantum optomechanics experiments. As with all strained nanomechanical resonators, an important limitation is bending loss due to mode curvature at the clamps. Here we use Bayesian optimization to design nanoribbons with optimal dissipation dilution of t…
▽ More
Recently it was discovered that torsion modes of strained nanoribbons exhibit dissipation dilution, giving a route to enhanced torque sensing and quantum optomechanics experiments. As with all strained nanomechanical resonators, an important limitation is bending loss due to mode curvature at the clamps. Here we use Bayesian optimization to design nanoribbons with optimal dissipation dilution of the fundamental torsion mode. Applied to centimeter-scale Si$_3$N$_4$ nanoribbons, we realize $Q$ factors exceeding 100 million and $Q$-frequency products exceeding $10^{13}$ Hz at room temperature. The thermal torque sensitivity of the reported devices is at the level of $10^{-20}\;\text{N}\,\text{m}/\sqrt{\text{Hz}}$ and the zero point angular displacement spectral density is at the level of $10^{-10}\;\text{rad}/\sqrt{\text{Hz}}$; they are moreover simple to fabricate, have high thermal conductivity, and can be heavily mass-loaded without diminishing their $Q$, making them attractive for diverse fundamental and applied weak force sensing tasks.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
MIR: Methodology Inspiration Retrieval for Scientific Research Problems
Authors:
Aniketh Garikaparthi,
Manasi Patwardhan,
Aditya Sanjiv Kanade,
Aman Hassan,
Lovekesh Vig,
Arman Cohan
Abstract:
There has been a surge of interest in harnessing the reasoning capabilities of Large Language Models (LLMs) to accelerate scientific discovery. While existing approaches rely on grounding the discovery process within the relevant literature, effectiveness varies significantly with the quality and nature of the retrieved literature. We address the challenge of retrieving prior work whose concepts c…
▽ More
There has been a surge of interest in harnessing the reasoning capabilities of Large Language Models (LLMs) to accelerate scientific discovery. While existing approaches rely on grounding the discovery process within the relevant literature, effectiveness varies significantly with the quality and nature of the retrieved literature. We address the challenge of retrieving prior work whose concepts can inspire solutions for a given research problem, a task we define as Methodology Inspiration Retrieval (MIR). We construct a novel dataset tailored for training and evaluating retrievers on MIR, and establish baselines. To address MIR, we build the Methodology Adjacency Graph (MAG); capturing methodological lineage through citation relationships. We leverage MAG to embed an "intuitive prior" into dense retrievers for identifying patterns of methodological inspiration beyond superficial semantic similarity. This achieves significant gains of +5.4 in Recall@3 and +7.8 in Mean Average Precision (mAP) over strong baselines. Further, we adapt LLM-based re-ranking strategies to MIR, yielding additional improvements of +4.5 in Recall@3 and +4.8 in mAP. Through extensive ablation studies and qualitative analyses, we exhibit the promise of MIR in enhancing automated scientific discovery and outline avenues for advancing inspiration-driven retrieval.
△ Less
Submitted 30 May, 2025;
originally announced June 2025.
-
Can Large Language Models Infer Causal Relationships from Real-World Text?
Authors:
Ryan Saklad,
Aman Chadha,
Oleg Pavlov,
Raha Moraffah
Abstract:
Understanding and inferring causal relationships from texts is a core aspect of human cognition and is essential for advancing large language models (LLMs) towards artificial general intelligence. Existing work primarily focuses on synthetically generated texts which involve simple causal relationships explicitly mentioned in the text. This fails to reflect the complexities of real-world tasks. In…
▽ More
Understanding and inferring causal relationships from texts is a core aspect of human cognition and is essential for advancing large language models (LLMs) towards artificial general intelligence. Existing work primarily focuses on synthetically generated texts which involve simple causal relationships explicitly mentioned in the text. This fails to reflect the complexities of real-world tasks. In this paper, we investigate whether LLMs are capable of inferring causal relationships from real-world texts. We develop a benchmark drawn from real-world academic literature which includes diverse texts with respect to length, complexity of relationships (different levels of explicitness, number of events, and causal relationships), and domains and sub-domains. To the best of our knowledge, our benchmark is the first-ever real-world dataset for this task. Our experiments on state-of-the-art LLMs evaluated on our proposed benchmark demonstrate significant challenges, with the best-performing model achieving an average F1 score of only 0.477. Analysis reveals common pitfalls: difficulty with implicitly stated information, in distinguishing relevant causal factors from surrounding contextual details, and with connecting causally relevant information spread across lengthy textual passages. By systematically characterizing these deficiencies, our benchmark offers targeted insights for further research into advancing LLM causal reasoning.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025)
Authors:
Damilare Emmanuel Olatunji,
Julius Dona Zannu,
Carine Pierrette Mukamakuza,
Godbright Nixon Uiso,
Chol Buol,
Mona Mamoun Mubarak Aman,
John Bosco Thuo,
Nchofon Tagha Ghogomu,
Evelyne Umubyeyi
Abstract:
AI-powered stethoscopes offer a promising alternative for screening rheumatic heart disease (RHD), particularly in regions with limited diagnostic infrastructure. Early detection is vital, yet echocardiography, the gold standard tool, remains largely inaccessible in low-resource settings due to cost and workforce constraints. This review systematically examines machine learning (ML) applications f…
▽ More
AI-powered stethoscopes offer a promising alternative for screening rheumatic heart disease (RHD), particularly in regions with limited diagnostic infrastructure. Early detection is vital, yet echocardiography, the gold standard tool, remains largely inaccessible in low-resource settings due to cost and workforce constraints. This review systematically examines machine learning (ML) applications from 2015 to 2025 that analyze electrocardiogram (ECG) and phonocardiogram (PCG) data to support accessible, scalable screening of all RHD variants in relation to the World Heart Federation's "25 by 25" goal to reduce RHD mortality. Using PRISMA-ScR guidelines, 37 peer-reviewed studies were selected from PubMed, IEEE Xplore, Scopus, and Embase. Convolutional neural networks (CNNs) dominate recent efforts, achieving a median accuracy of 97.75%, F1-score of 0.95, and AUROC of 0.89. However, challenges remain: 73% of studies used single-center datasets, 81.1% relied on private data, only 10.8% were externally validated, and none assessed cost-effectiveness. Although 45.9% originated from endemic regions, few addressed demographic diversity or implementation feasibility. These gaps underscore the disconnect between model performance and clinical readiness. Bridging this divide requires standardized benchmark datasets, prospective trials in endemic areas, and broader validation. If these issues are addressed, AI-augmented auscultation could transform cardiovascular diagnostics in underserved populations, thereby aiding early detection. This review also offers practical recommendations for building accessible ML-based RHD screening tools, aiming to close the diagnostic gap in low-resource settings where conventional auscultation may miss up to 90% of cases and echocardiography remains out of reach.
△ Less
Submitted 1 July, 2025; v1 submitted 17 May, 2025;
originally announced May 2025.
-
Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods
Authors:
Shaina Raza,
Rizwan Qureshi,
Marcelo Lotif,
Aman Chadha,
Deval Pandya,
Christos Emmanouilidis
Abstract:
Generative AI models often learn and reproduce false information present in their training corpora. This position paper argues that, analogous to biological immunization, where controlled exposure to a weakened pathogen builds immunity, AI models should be fine tuned on small, quarantined sets of explicitly labeled falsehoods as a "vaccine" against misinformation. These curated false examples are…
▽ More
Generative AI models often learn and reproduce false information present in their training corpora. This position paper argues that, analogous to biological immunization, where controlled exposure to a weakened pathogen builds immunity, AI models should be fine tuned on small, quarantined sets of explicitly labeled falsehoods as a "vaccine" against misinformation. These curated false examples are periodically injected during finetuning, strengthening the model ability to recognize and reject misleading claims while preserving accuracy on truthful inputs. An illustrative case study shows that immunized models generate substantially less misinformation than baselines. To our knowledge, this is the first training framework that treats fact checked falsehoods themselves as a supervised vaccine, rather than relying on input perturbations or generic human feedback signals, to harden models against future misinformation. We also outline ethical safeguards and governance controls to ensure the safe use of false data. Model immunization offers a proactive paradigm for aligning AI systems with factuality.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Swin Transformer for Robust CGI Images Detection: Intra- and Inter-Dataset Analysis across Multiple Color Spaces
Authors:
Preeti Mehta,
Aman Sagar,
Suchi Kumari
Abstract:
This study aims to address the growing challenge of distinguishing computer-generated imagery (CGI) from authentic digital images across three different color spaces; RGB, YCbCr, and HSV. Given the limitations of existing classification methods in handling the complexity and variability of CGI, this research proposes a Swin Transformer based model for accurate differentiation between natural and s…
▽ More
This study aims to address the growing challenge of distinguishing computer-generated imagery (CGI) from authentic digital images across three different color spaces; RGB, YCbCr, and HSV. Given the limitations of existing classification methods in handling the complexity and variability of CGI, this research proposes a Swin Transformer based model for accurate differentiation between natural and synthetic images. The proposed model leverages the Swin Transformer's hierarchical architecture to capture local and global features for distinguishing CGI from natural images. Its performance was assessed through intra- and inter-dataset testing across three datasets: CiFAKE, JSSSTU, and Columbia. The model was evaluated individually on each dataset (D1, D2, D3) and on the combined datasets (D1+D2+D3) to test its robustness and domain generalization. To address dataset imbalance, data augmentation techniques were applied. Additionally, t-SNE visualization was used to demonstrate the feature separability achieved by the Swin Transformer across the selected color spaces. The model's performance was tested across all color schemes, with the RGB color scheme yielding the highest accuracy for each dataset. As a result, RGB was selected for domain generalization analysis and compared with other CNN-based models, VGG-19 and ResNet-50. The comparative results demonstrate the proposed model's effectiveness in detecting CGI, highlighting its robustness and reliability in both intra-dataset and inter-dataset evaluations. The findings of this study highlight the Swin Transformer model's potential as an advanced tool for digital image forensics, particularly in distinguishing CGI from natural images. The model's strong performance indicates its capability for domain generalization, making it a valuable asset in scenarios requiring precise and reliable image classification.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
The Stablecoin Discount: Evidence of Tether's U.S. Treasury Bill Market Share in Lowering Yields
Authors:
Lennart Ante,
Aman Saggu,
Ingo Fiedler
Abstract:
Stablecoins represent a critical bridge between cryptocurrency and traditional finance, with Tether (USDT) dominating the sector as the largest stablecoin by market capitalization. By Q1 2025, Tether directly held approximately…
▽ More
Stablecoins represent a critical bridge between cryptocurrency and traditional finance, with Tether (USDT) dominating the sector as the largest stablecoin by market capitalization. By Q1 2025, Tether directly held approximately $98.5 billion in U.S. Treasury bills, representing 1.6% of all outstanding Treasury bills, making it one of the largest non-sovereign buyers in this crucial asset class, on par with nation-state-level investors. This paper investigates how Tether's market share of U.S. Treasury bills influences corresponding yields. The baseline semi-log time trend model finds that a 1% increase in Tether's market share is associated with a 1-month yield reduction of 3.8%, corresponding to 14-16 basis points. However, threshold regression analysis reveals a critical market share threshold of 0.973%, above which the yield impact intensifies significantly. In this high regime, a 1% market share increase reduces 1-month yields by 6.3%. At the end of Q1 2025, Tether's market share placed it firmly within this high-impact regime, reducing 1-month yields by around 24 basis points relative to a counterfactual. In absolute terms, Tether's demand for Treasury Bills equates to roughly $15 billion in annual interest savings for the U.S. government. Aligning with theories of liquidity saturation and nonlinear price impact, these results highlight that stablecoin demand can reduce sovereign funding costs and provide a potential buffer against market shocks.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Effects of Coupling Between Chiral Vibrations and Spins in Molecular Magnets
Authors:
Aman Ullah,
Sergey A. Varganov,
Yafis Barlas
Abstract:
In single molecular magnets, chiral vibrations carrying vibrational angular momentum ($\hat{L}^{\text{vib}}$) emerge due to the splitting of a doubly degenerate vibrational mode. Here, we identify a new type of effective spin-vibrational coupling responsible for lifting this degeneracy, which can facilitate optically selective excitations. In the presence of an external Zeeman field, this coupling…
▽ More
In single molecular magnets, chiral vibrations carrying vibrational angular momentum ($\hat{L}^{\text{vib}}$) emerge due to the splitting of a doubly degenerate vibrational mode. Here, we identify a new type of effective spin-vibrational coupling responsible for lifting this degeneracy, which can facilitate optically selective excitations. In the presence of an external Zeeman field, this coupling breaks both inversion (in-plane parity) $\mathcal{P}$ and time-reversal $\mathcal{T}$ symmetries, imparting distinct geometric phases to the resulting dressed spin-vibronic states. The wave function of the spin-vibronic state is characterized by a $π$-Berry phase, which results in magneto-optical circular dichroism. This framework is validated using density functional theory and multi-reference \emph{ab initio} calculations on the Ce(trenovan) molecular magnet.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques
Authors:
Aman Raj,
Lakshit Arora,
Sanjay Surendranath Girija,
Shashank Kapoor,
Dipen Pradhan,
Ankit Shetgaonkar
Abstract:
Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly and efficiently assess the intensity of damage. Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) presents a breakthrough solution, capable of combining knowledge from multip…
▽ More
Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly and efficiently assess the intensity of damage. Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) presents a breakthrough solution, capable of combining knowledge from multiple types and sources of data, simulating realistic scenarios of disaster, and identifying emerging trends at a speed previously unimaginable. In this paper, we present a comprehensive review on the prospects of AI and GenAI in damage assessment for various natural disasters, highlighting both its strengths and limitations. We talk about its application to multimodal data such as text, image, video, and audio, and also cover major issues of data privacy, security, and ethical use of the technology during crises. The paper also recognizes the threat of Generative AI misuse, in the form of dissemination of misinformation and for adversarial attacks. Finally, we outline avenues of future research, emphasizing the need for secure, reliable, and ethical Generative AI systems for disaster management in general. We believe that this work represents the first comprehensive survey of Gen-AI techniques being used in the field of Disaster Assessment and Response.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Opportunities and Applications of GenAI in Smart Cities: A User-Centric Survey
Authors:
Ankit Shetgaonkar,
Dipen Pradhan,
Lakshit Arora,
Sanjay Surendranath Girija,
Shashank Kapoor,
Aman Raj
Abstract:
The proliferation of IoT in cities, combined with Digital Twins, creates a rich data foundation for Smart Cities aimed at improving urban life and operations. Generative AI (GenAI) significantly enhances this potential, moving beyond traditional AI analytics and predictions by processing multimodal content and generating novel outputs like text and simulations. Using specialized or foundational mo…
▽ More
The proliferation of IoT in cities, combined with Digital Twins, creates a rich data foundation for Smart Cities aimed at improving urban life and operations. Generative AI (GenAI) significantly enhances this potential, moving beyond traditional AI analytics and predictions by processing multimodal content and generating novel outputs like text and simulations. Using specialized or foundational models, GenAI's natural language abilities such as Natural Language Understanding (NLU) and Natural Language Generation (NLG) can power tailored applications and unified interfaces, dramatically lowering barriers for users interacting with complex smart city systems. In this paper, we focus on GenAI applications based on conversational interfaces within the context of three critical user archetypes in a Smart City - Citizens, Operators and Planners. We identify and review GenAI models and techniques that have been proposed or deployed for various urban subsystems in the contexts of these user archetypes. We also consider how GenAI can be built on the existing data foundation of official city records, IoT data streams and Urban Digital Twins. We believe this work represents the first comprehensive summarization of GenAI techniques for Smart Cities from the lens of the critical users in a Smart City.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Neural Signatures Within and Between Chess Puzzle Solving and Standard Cognitive Tasks for Brain-Computer Interfaces: A Low-Cost Electroencephalography Study
Authors:
Matthew Russell,
Samuel Youkeles,
William Xia,
Kenny Zheng,
Aman Shah,
Robert J. K. Jacob
Abstract:
Consumer-grade electroencephalography (EEG) devices show promise for Brain-Computer Interface (BCI) applications, but their efficacy in detecting subtle cognitive states remains understudied. We developed a comprehensive study paradigm which incorporates a combination of established cognitive tasks (N-Back, Stroop, and Mental Rotation) and adds a novel ecological Chess puzzles task. We tested our…
▽ More
Consumer-grade electroencephalography (EEG) devices show promise for Brain-Computer Interface (BCI) applications, but their efficacy in detecting subtle cognitive states remains understudied. We developed a comprehensive study paradigm which incorporates a combination of established cognitive tasks (N-Back, Stroop, and Mental Rotation) and adds a novel ecological Chess puzzles task. We tested our paradigm with the MUSE 2, a low-cost consumer-grade EEG device. Using linear mixed-effects modeling we demonstrate successful distinctions of within-task workload levels and cross-task cognitive states based on the spectral power data derived from the MUSE 2 device. With machine learning we further show reliable predictive power to differentiate between workload levels in the N-Back task, and also achieve effective cross-task classification. These findings demonstrate that consumer-grade EEG devices like the MUSE 2 can be used to effectively differentiate between various levels of cognitive workload as well as among more nuanced task-based cognitive states, and that these tools can be leveraged for real-time adaptive BCI applications in practical settings.
△ Less
Submitted 3 June, 2025; v1 submitted 12 May, 2025;
originally announced May 2025.
-
On the Robustness of Reward Models for Language Model Alignment
Authors:
Jiwoo Hong,
Noah Lee,
Eunki Kim,
Guijin Son,
Woojin Chung,
Aman Gupta,
Shao Tang,
James Thorne
Abstract:
The Bradley-Terry (BT) model is widely practiced in reward modeling for reinforcement learning with human feedback (RLHF). Despite its effectiveness, reward models (RMs) trained with BT model loss are prone to over-optimization, losing generalizability to unseen input distributions. In this paper, we study the cause of over-optimization in RM training and its downstream effects on the RLHF procedu…
▽ More
The Bradley-Terry (BT) model is widely practiced in reward modeling for reinforcement learning with human feedback (RLHF). Despite its effectiveness, reward models (RMs) trained with BT model loss are prone to over-optimization, losing generalizability to unseen input distributions. In this paper, we study the cause of over-optimization in RM training and its downstream effects on the RLHF procedure, accentuating the importance of distributional robustness of RMs in unseen data. First, we show that the excessive dispersion of hidden state norms is the main source of over-optimization. Then, we propose batch-wise sum-to-zero regularization (BSR) to enforce zero-centered reward sum per batch, constraining the rewards with extreme magnitudes. We assess the impact of BSR in improving robustness in RMs through four scenarios of over-optimization, where BSR consistently manifests better robustness. Subsequently, we compare the plain BT model and BSR on RLHF training and empirically show that robust RMs better align the policy to the gold preference model. Finally, we apply BSR to high-quality data and models, which surpasses state-of-the-art RMs in the 8B scale by adding more than 5% in complex preference prediction tasks. By conducting RLOO training with 8B RMs, AlpacaEval 2.0 reduces generation length by 40% while adding a 7% increase in win rate, further highlighting that robustness in RMs induces robustness in RLHF training. We release the code, data, and models: https://github.com/LinkedIn-XFACT/RM-Robustness.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Explainable Artificial Intelligence Techniques for Software Development Lifecycle: A Phase-specific Survey
Authors:
Lakshit Arora,
Sanjay Surendranath Girija,
Shashank Kapoor,
Aman Raj,
Dipen Pradhan,
Ankit Shetgaonkar
Abstract:
Artificial Intelligence (AI) is rapidly expanding and integrating more into daily life to automate tasks, guide decision making, and enhance efficiency. However, complex AI models, which make decisions without providing clear explanations (known as the "black-box problem"), currently restrict trust and widespread adoption of AI. Explainable Artificial Intelligence (XAI) has emerged to address the…
▽ More
Artificial Intelligence (AI) is rapidly expanding and integrating more into daily life to automate tasks, guide decision making, and enhance efficiency. However, complex AI models, which make decisions without providing clear explanations (known as the "black-box problem"), currently restrict trust and widespread adoption of AI. Explainable Artificial Intelligence (XAI) has emerged to address the black-box problem of making AI systems more interpretable and transparent so stakeholders can trust, verify, and act upon AI-based outcomes. Researchers have developed various techniques to foster XAI in the Software Development Lifecycle. However, there are gaps in applying XAI techniques in the Software Engineering phases. Literature review shows that 68% of XAI in Software Engineering research is focused on maintenance as opposed to 8% on software management and requirements. In this paper, we present a comprehensive survey of the applications of XAI methods such as concept-based explanations, Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), rule extraction, attention mechanisms, counterfactual explanations, and example-based explanations to the different phases of the Software Development Life Cycle (SDLC), including requirements elicitation, design and development, testing and deployment, and evolution. To the best of our knowledge, this paper presents the first comprehensive survey of XAI techniques for every phase of the Software Development Life Cycle (SDLC). This survey aims to promote explainable AI in Software Engineering and facilitate the practical application of complex AI models in AI-driven software development.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions
Authors:
Shashank Agnihotri,
Amaan Ansari,
Annika Dackermann,
Fabian Rösch,
Margret Keuper
Abstract:
Deep learning (DL) has surpassed human performance on standard benchmarks, driving its widespread adoption in computer vision tasks. One such task is disparity estimation, estimating the disparity between matching pixels in stereo image pairs, which is crucial for safety-critical applications like medical surgeries and autonomous navigation. However, DL-based disparity estimation methods are highl…
▽ More
Deep learning (DL) has surpassed human performance on standard benchmarks, driving its widespread adoption in computer vision tasks. One such task is disparity estimation, estimating the disparity between matching pixels in stereo image pairs, which is crucial for safety-critical applications like medical surgeries and autonomous navigation. However, DL-based disparity estimation methods are highly susceptible to distribution shifts and adversarial attacks, raising concerns about their reliability and generalization. Despite these concerns, a standardized benchmark for evaluating the robustness of disparity estimation methods remains absent, hindering progress in the field.
To address this gap, we introduce DispBench, a comprehensive benchmarking tool for systematically assessing the reliability of disparity estimation methods. DispBench evaluates robustness against synthetic image corruptions such as adversarial attacks and out-of-distribution shifts caused by 2D Common Corruptions across multiple datasets and diverse corruption scenarios. We conduct the most extensive performance and robustness analysis of disparity estimation methods to date, uncovering key correlations between accuracy, reliability, and generalization. Open-source code for DispBench: https://github.com/shashankskagnihotri/benchmarking_robustness/tree/disparity_estimation/final/disparity_estimation
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Integrating Communication, Sensing, and Security: Progress and Prospects of PLS in ISAC Systems
Authors:
Waqas Aman,
El-Mehdi Illi,
Marwa Qaraqe,
Saif Al-Kuwari
Abstract:
The sixth generation of wireless networks defined several key performance indicators (KPIs) for assessing its networks, mainly in terms of reliability, coverage, and sensing. In this regard, remarkable attention has been paid recently to the integrated sensing and communication (ISAC) paradigm as an enabler for efficiently and jointly performing communication and sensing using the same spectrum an…
▽ More
The sixth generation of wireless networks defined several key performance indicators (KPIs) for assessing its networks, mainly in terms of reliability, coverage, and sensing. In this regard, remarkable attention has been paid recently to the integrated sensing and communication (ISAC) paradigm as an enabler for efficiently and jointly performing communication and sensing using the same spectrum and hardware resources. On the other hand, ensuring communication and data security has been an imperative requirement for wireless networks throughout their evolution. The physical-layer security (PLS) concept paved the way to catering to the security needs in wireless networks in a sustainable way while guaranteeing theoretically secure transmissions, independently of the computational capacity of adversaries. Therefore, it is of paramount importance to consider a balanced trade-off between communication reliability, sensing, and security in future networks, such as the 5G and beyond, and the 6G. In this paper, we provide a comprehensive and system-wise review of designed secure ISAC systems from a PLS point of view. In particular, the impact of various physical-layer techniques, schemes, and wireless technologies to ensure the sensing-security trade-off is studied from the surveyed work. Furthermore, the amalgamation of PLS and ISAC is analyzed in a broader impact by considering attacks targeting data confidentiality, communication covertness, and sensing spoofing. The paper also serves as a tutorial by presenting several theoretical foundations on ISAC and PLS, which represent a practical guide for readers to develop novel secure ISAC network designs.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
On the Vulnerability of Underwater Magnetic Induction Communication
Authors:
Muhammad Muzzammil,
Waqas Aman,
Irfan Ullah,
Shang Zhigang,
Saif Al-Kuwari,
Zhou Tian,
Marwa Qaraqe
Abstract:
Typical magnetic induction (MI) communication is commonly considered a secure underwater wireless communication (UWC) technology due to its non-audible and non-visible nature compared to acoustic and optical UWC technologies. However, vulnerabilities in communication systems inevitably exist and may lead to different types of attacks. In this paper, we investigate the eavesdropping attack in under…
▽ More
Typical magnetic induction (MI) communication is commonly considered a secure underwater wireless communication (UWC) technology due to its non-audible and non-visible nature compared to acoustic and optical UWC technologies. However, vulnerabilities in communication systems inevitably exist and may lead to different types of attacks. In this paper, we investigate the eavesdropping attack in underwater MI communication to quantitatively measure the system's vulnerability under this attack. We consider different potential eavesdropping configuration setups based on the positions and orientations of the eavesdropper node to investigate how they impact the received voltage and secrecy at the legitimate receiver node. To this end, we develop finite-element-method-based simulation models for each configuration in an underwater environment and evaluate the received voltage and the secrecy capacity against different system parameters such as magnetic flux, magnetic flux density, distance, and orientation sensitivity. Furthermore, we construct an experimental setup within a laboratory environment to replicate the simulation experiments. Both simulation and lab experimental confirm the susceptibility of underwater MI communication to eavesdropping attacks. However, this vulnerability is highly dependent on the position and orientation of the coil between the eavesdropper and the legitimate transmitter. On the positive side, we also observe a unique behavior in the received coil reception that might be used to detect malicious node activities in the vicinity, which might lead to a potential security mechanism against eavesdropping attacks.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Authors:
Shashank Kapoor,
Sanjay Surendranath Girija,
Lakshit Arora,
Dipen Pradhan,
Ankit Shetgaonkar,
Aman Raj
Abstract:
The introduction of multimodal models is a huge step forward in Artificial Intelligence. A single model is trained to understand multiple modalities: text, image, video, and audio. Open-source multimodal models have made these breakthroughs more accessible. However, considering the vast landscape of adversarial attacks across these modalities, these models also inherit vulnerabilities of all the m…
▽ More
The introduction of multimodal models is a huge step forward in Artificial Intelligence. A single model is trained to understand multiple modalities: text, image, video, and audio. Open-source multimodal models have made these breakthroughs more accessible. However, considering the vast landscape of adversarial attacks across these modalities, these models also inherit vulnerabilities of all the modalities, and ultimately, the adversarial threat amplifies. While broad research is available on possible attacks within or across these modalities, a practitioner-focused view that outlines attack types remains absent in the multimodal world. As more Machine Learning Practitioners adopt, fine-tune, and deploy open-source models in real-world applications, it's crucial that they can view the threat landscape and take the preventive actions necessary. This paper addresses the gap by surveying adversarial attacks targeting all four modalities: text, image, video, and audio. This survey provides a view of the adversarial attack landscape and presents how multimodal adversarial threats have evolved. To the best of our knowledge, this survey is the first comprehensive summarization of the threat landscape in the multimodal world.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Bemba Speech Translation: Exploring a Low-Resource African Language
Authors:
Muhammad Hazim Al Farouq,
Aman Kassahun Wassie,
Yasmin Moslem
Abstract:
This paper describes our system submission to the International Conference on Spoken Language Translation (IWSLT 2025), low-resource languages track, namely for Bemba-to-English speech translation. We built cascaded speech translation systems based on Whisper and NLLB-200, and employed data augmentation techniques, such as back-translation. We investigate the effect of using synthetic data and dis…
▽ More
This paper describes our system submission to the International Conference on Spoken Language Translation (IWSLT 2025), low-resource languages track, namely for Bemba-to-English speech translation. We built cascaded speech translation systems based on Whisper and NLLB-200, and employed data augmentation techniques, such as back-translation. We investigate the effect of using synthetic data and discuss our experimental setup.
△ Less
Submitted 2 June, 2025; v1 submitted 5 May, 2025;
originally announced May 2025.
-
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Authors:
Sanjay Surendranath Girija,
Shashank Kapoor,
Lakshit Arora,
Dipen Pradhan,
Aman Raj,
Ankit Shetgaonkar
Abstract:
Large Language Models (LLMs) have revolutionized many areas of artificial intelligence (AI), but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments. We examine three primary approaches: Knowledge Distillati…
▽ More
Large Language Models (LLMs) have revolutionized many areas of artificial intelligence (AI), but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments. We examine three primary approaches: Knowledge Distillation, Model Quantization, and Model Pruning. For each technique, we discuss the underlying principles, present different variants, and provide examples of successful applications. We also briefly discuss complementary techniques such as mixture-of-experts and early-exit strategies. Finally, we highlight promising future directions, aiming to provide a valuable resource for both researchers and practitioners seeking to optimize LLMs for edge deployment.
△ Less
Submitted 8 May, 2025; v1 submitted 4 May, 2025;
originally announced May 2025.
-
Canonicalization for Unreproducible Builds in Java
Authors:
Aman Sharma,
Benoit Baudry,
Martin Monperrus
Abstract:
The increasing complexity of software supply chains and the rise of supply chain attacks have elevated concerns around software integrity. Users and stakeholders face significant challenges in validating that a given software artifact corresponds to its declared source. Reproducible Builds address this challenge by ensuring that independently performed builds from identical source code produce ide…
▽ More
The increasing complexity of software supply chains and the rise of supply chain attacks have elevated concerns around software integrity. Users and stakeholders face significant challenges in validating that a given software artifact corresponds to its declared source. Reproducible Builds address this challenge by ensuring that independently performed builds from identical source code produce identical binaries. However, achieving reproducibility at scale remains difficult, especially in Java, due to a range of non-deterministic factors and caveats in the build process. In this work, we focus on reproducibility in Java-based software, archetypal of enterprise applications. We introduce a conceptual framework for reproducible builds, we analyze a large dataset from Reproducible Central, and we develop a novel taxonomy of six root causes of unreproducibility. We study actionable mitigations: artifact and bytecode canonicalization using OSS-Rebuild and jNorm respectively. Finally, we present Chains-Rebuild, a tool that raises reproducibility success from 9.48% to 26.89% on 12,283 unreproducible artifacts. To sum up, our contributions are the first large-scale taxonomy of build unreproducibility causes in Java, a publicly available dataset of unreproducible builds, and Chains-Rebuild, a canonicalization tool for mitigating unreproducible builds in Java.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report
Authors:
Paul Kassianik,
Baturay Saglam,
Alexander Chen,
Blaine Nelson,
Anu Vellore,
Massimo Aufiero,
Fraser Burch,
Dhruv Kedia,
Avi Zohary,
Sajana Weerawardhena,
Aman Priyanshu,
Adam Swanda,
Amy Chang,
Hyrum Anderson,
Kojin Oshiba,
Omar Santos,
Yaron Singer,
Amin Karbasi
Abstract:
As transformer-based large language models (LLMs) increasingly permeate society, they have revolutionized domains such as software engineering, creative writing, and digital arts. However, their adoption in cybersecurity remains limited due to challenges like scarcity of specialized training data and complexity of representing cybersecurity-specific knowledge. To address these gaps, we present Fou…
▽ More
As transformer-based large language models (LLMs) increasingly permeate society, they have revolutionized domains such as software engineering, creative writing, and digital arts. However, their adoption in cybersecurity remains limited due to challenges like scarcity of specialized training data and complexity of representing cybersecurity-specific knowledge. To address these gaps, we present Foundation-Sec-8B, a cybersecurity-focused LLM built on the Llama 3.1 architecture and enhanced through continued pretraining on a carefully curated cybersecurity corpus. We evaluate Foundation-Sec-8B across both established and new cybersecurity benchmarks, showing that it matches Llama 3.1-70B and GPT-4o-mini in certain cybersecurity-specific tasks. By releasing our model to the public, we aim to accelerate progress and adoption of AI-driven tools in both public and private cybersecurity contexts.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation
Authors:
Amaan Izhar,
Nurul Japar,
Norisma Idris,
Ting Dang
Abstract:
Medical image reporting (MIR) aims to generate structured clinical descriptions from radiological images. Existing methods struggle with fine-grained feature extraction, multimodal alignment, and generalization across diverse imaging types, often relying on vanilla transformers and focusing primarily on chest X-rays. We propose MicarVLMoE, a vision-language mixture-of-experts model with gated cros…
▽ More
Medical image reporting (MIR) aims to generate structured clinical descriptions from radiological images. Existing methods struggle with fine-grained feature extraction, multimodal alignment, and generalization across diverse imaging types, often relying on vanilla transformers and focusing primarily on chest X-rays. We propose MicarVLMoE, a vision-language mixture-of-experts model with gated cross-aligned fusion, designed to address these limitations. Our architecture includes: (i) a multiscale vision encoder (MSVE) for capturing anatomical details at varying resolutions, (ii) a multihead dual-branch latent attention (MDLA) module for vision-language alignment through latent bottleneck representations, and (iii) a modulated mixture-of-experts (MoE) decoder for adaptive expert specialization. We extend MIR to CT scans, retinal imaging, MRI scans, and gross pathology images, reporting state-of-the-art results on COVCTR, MMR, PGROSS, and ROCO datasets. Extensive experiments and ablations confirm improved clinical accuracy, cross-modal alignment, and model interpretability. Code is available at https://github.com/AI-14/micar-vl-moe.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Horizon-Driven Expansion from Hawking-Like Radiation: A Curvature-Coupled Cosmological Model
Authors:
Aman Singh
Abstract:
We propose a cosmological model in which the expansion of the universe is driven by a Hawking-like influx of energy across the cosmological horizon, rather than from a fixed cosmological constant. In place of a cosmological constant, we introduce source terms in the Friedmann and continuity equations that couple horizon curvature to matter and radiation densities. At high curvature (large Hubble…
▽ More
We propose a cosmological model in which the expansion of the universe is driven by a Hawking-like influx of energy across the cosmological horizon, rather than from a fixed cosmological constant. In place of a cosmological constant, we introduce source terms in the Friedmann and continuity equations that couple horizon curvature to matter and radiation densities. At high curvature (large Hubble parameter $H$), this influx strongly replenishes matter and radiation, slowing their adiabatic dilution. As curvature diminishes, the influx weakens, smoothly transitioning into standard radiation- or matter-dominated eras. This mechanism naturally suppresses spatial curvature without requiring an inflationary phase. It may also produce near-scale-invariant fluctuations via slowly varying horizon thermodynamics.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Unified and consistent structure growth measurements from joint ACT, SPT and \textit{Planck} CMB lensing
Authors:
Frank J. Qu,
Fei Ge,
W. L. Kimmy Wu,
Irene Abril-Cabezas,
Mathew S. Madhavacheril,
Marius Millea,
Ethan Anderes,
Adam J. Anderson,
Behzad Ansarinejad,
Melanie Archipley,
Zachary Atkins,
Lennart Balkenhol,
Nicholas Battaglia,
Karim Benabed,
Amy N. Bender,
Bradford A. Benson,
Federico Bianchini,
Lindsey. E. Bleem,
Boris Bolliet,
J Richard Bond,
François. R. Bouchet,
Lincoln Bryant,
Erminia Calabrese,
Etienne Camphuis,
John E. Carlstrom
, et al. (120 additional authors not shown)
Abstract:
We present the tightest cosmic microwave background (CMB) lensing constraints to date on the growth of structure by combining CMB lensing measurements from the Atacama Cosmology Telescope (ACT), the South Pole Telescope (SPT) and \textit{Planck}. Each of these surveys individually provides lensing measurements with similarly high statistical power, achieving signal-to-noise ratios of approximately…
▽ More
We present the tightest cosmic microwave background (CMB) lensing constraints to date on the growth of structure by combining CMB lensing measurements from the Atacama Cosmology Telescope (ACT), the South Pole Telescope (SPT) and \textit{Planck}. Each of these surveys individually provides lensing measurements with similarly high statistical power, achieving signal-to-noise ratios of approximately 40. The combined lensing bandpowers represent the most precise CMB lensing power spectrum measurement to date with a signal-to-noise ratio of 61 and an amplitude of $A_\mathrm{lens}^\mathrm{recon} = 1.025 \pm 0.017$ with respect to the theory prediction from the best-fit CMB \textit{Planck}-ACT cosmology. The bandpowers from all three lensing datasets, analyzed jointly, yield a $1.6\%$ measurement of the parameter combination $S_8^\mathrm{CMBL} \equiv σ_8\,(Ω_m/0.3)^{0.25} = 0.825^{+0.015}_{-0.013}$. Including Dark Energy Spectroscopic Instrument (DESI) Baryon Acoustic Oscillation (BAO) data improves the constraint on the amplitude of matter fluctuations to $σ_8 = 0.829 \pm 0.009$ (a $1.1\%$ determination). When combining with uncalibrated supernovae from \texttt{Pantheon+}, we present a $4\%$ sound-horizon-independent estimate of $H_0=66.4\pm2.5\,\mathrm{km\,s^{-1}\,Mpc^{-1}} $. The joint lensing constraints on structure growth and present-day Hubble rate are fully consistent with a $Λ$CDM model fit to the primary CMB data from \textit{Planck} and ACT. While the precise upper limit is sensitive to the choice of data and underlying model assumptions, when varying the neutrino mass sum within the $Λ\mathrm{CDM}$ cosmological model, the combination of primary CMB, BAO and CMB lensing drives the probable upper limit for the mass sum towards lower values, comparable to the minimum mass prior required by neutrino oscillation experiments.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Disentangling the global multiplicity and spectral shape fluctuations in radial flow
Authors:
Somadutta Bhatta,
Aman Dimri,
Jiangyong Jia
Abstract:
Radial flow is a key collective phenomenon in heavy-ion collisions, manifests through event-by-event fluctuations in transverse momentum ($p_{\mathrm{T}}$) spectra. The $p_{\mathrm{T}}$-differential radial flow, $v_0(p_{\mathrm{T}})$, initially conceived to capture local spectral shape fluctuations, is influenced by global multiplicity fluctuations. Using the HIJING model, we explore how different…
▽ More
Radial flow is a key collective phenomenon in heavy-ion collisions, manifests through event-by-event fluctuations in transverse momentum ($p_{\mathrm{T}}$) spectra. The $p_{\mathrm{T}}$-differential radial flow, $v_0(p_{\mathrm{T}})$, initially conceived to capture local spectral shape fluctuations, is influenced by global multiplicity fluctuations. Using the HIJING model, we explore how different definitions of event activity for centrality and spectral normalization schemes affect $v_0(p_{\mathrm{T}})$. We find these methodological variations induce a constant offset in $v_0(p_{\mathrm{T}})$ without altering its shape, indicating that the dynamic $p_{\mathrm{T}}$-differential information on radial flow remains robust, but its absolute magnitude is meaningful only up to a baseline offset dictated by global multiplicity fluctuations.
△ Less
Submitted 29 June, 2025; v1 submitted 28 April, 2025;
originally announced April 2025.
-
Self-sorting of bidisperse particles in evaporating sessile droplets
Authors:
Aman Kumar Jain,
Fabian Denner,
Berend van Wachem
Abstract:
This study investigates the dispersion and self-sorting dynamics of bidisperse particles, i.e., a mixture of two distinct particle sizes, during the evaporation of ethanol droplets on a heated substrate, focusing on the influence of surface wettability, Marangoni stresses, and relative particle density. To this end, numerical simulations are carried out using a two-stage numerical approach: the fi…
▽ More
This study investigates the dispersion and self-sorting dynamics of bidisperse particles, i.e., a mixture of two distinct particle sizes, during the evaporation of ethanol droplets on a heated substrate, focusing on the influence of surface wettability, Marangoni stresses, and relative particle density. To this end, numerical simulations are carried out using a two-stage numerical approach: the first stage simulates the gas-liquid flow along with the heat and vapor distribution, while the second stage models the particle behavior using Lagrangian particle tracking. The results reveal that for an ethanol droplet evaporating with a constant contact angle in the absence of thermocapillary Marangoni stresses, the flow induced by the receding motion of the contact line supersedes the capillary flow, moving the fluid from the contact line to the apex of the droplet. This flow moves the particles from the bulk of the droplet to the apex of the droplet and suppresses size-based self-sorting of the particles. However, in the presence of Marangoni stresses, a flow along the interface near the apex of the droplet promotes the self-sorting of particles based on their size, whereby smaller particles concentrate near the droplet apex and larger particles form an outer shell around them.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.