Search | arXiv e-print repository

arXiv:2505.20099 [pdf, ps, other]

Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities

Authors: Chuangtao Ma, Yongrui Chen, Tianxing Wu, Arijit Khan, Haofen Wang

Abstract: Large language models (LLMs) have demonstrated remarkable performance on question-answering (QA) tasks because of their superior capabilities in natural language understanding and generation. However, LLM-based QA struggles with complex QA tasks due to poor reasoning capacity, outdated knowledge, and hallucinations. Several recent works synthesize LLMs and knowledge graphs (KGs) for QA to address… ▽ More Large language models (LLMs) have demonstrated remarkable performance on question-answering (QA) tasks because of their superior capabilities in natural language understanding and generation. However, LLM-based QA struggles with complex QA tasks due to poor reasoning capacity, outdated knowledge, and hallucinations. Several recent works synthesize LLMs and knowledge graphs (KGs) for QA to address the above challenges. In this survey, we propose a new structured taxonomy that categorizes the methodology of synthesizing LLMs and KGs for QA according to the categories of QA and the KG's role when integrating with LLMs. We systematically survey state-of-the-art advances in synthesizing LLMs and KGs for QA and compare and analyze these approaches in terms of strength, limitations, and KG requirements. We then align the approaches with QA and discuss how these approaches address the main challenges of different complex QA. Finally, we summarize the advancements, evaluation metrics, and benchmark datasets and highlight open challenges and opportunities. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: Under Review

arXiv:2505.19249 [pdf, ps, other]

RGC-Bent: A Novel Dataset for Bent Radio Galaxy Classification

Authors: Mir Sazzat Hossain, Khan Muhammad Bin Asad, Payaswini Saikia, Adrita Khan, Md Akil Raihan Iftee, Rakibul Hasan Rajib, Arshad Momen, Md Ashraful Amin, Amin Ahsan Ali, AKM Mahbubur Rahman

Abstract: We introduce a novel machine learning dataset tailored for the classification of bent radio active galactic nuclei (AGN) in astronomical observations. Bent radio AGN, distinguished by their curved jet structures, provide critical insights into galaxy cluster dynamics, interactions within the intracluster medium, and the broader physics of AGN. Despite their astrophysical significance, the classifi… ▽ More We introduce a novel machine learning dataset tailored for the classification of bent radio active galactic nuclei (AGN) in astronomical observations. Bent radio AGN, distinguished by their curved jet structures, provide critical insights into galaxy cluster dynamics, interactions within the intracluster medium, and the broader physics of AGN. Despite their astrophysical significance, the classification of bent radio AGN remains a challenge due to the scarcity of specialized datasets and benchmarks. To address this, we present a dataset, derived from a well-recognized radio astronomy survey, that is designed to support the classification of NAT (Narrow-Angle Tail) and WAT (Wide-Angle Tail) categories, along with detailed data processing steps. We further evaluate the performance of state-of-the-art deep learning models on the dataset, including Convolutional Neural Networks (CNNs), and transformer-based architectures. Our results demonstrate the effectiveness of advanced machine learning models in classifying bent radio AGN, with ConvNeXT achieving the highest F1-scores for both NAT and WAT sources. By sharing this dataset and benchmarks, we aim to facilitate the advancement of research in AGN classification, galaxy cluster environments and galaxy evolution. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: 6 pages, 3 figures, 2 tables, Accepted In ICIP 2025

arXiv:2505.18450 [pdf, other]

BRIT: Bidirectional Retrieval over Unified Image-Text Graph

Authors: Ainulla Khan, Yamada Moyuru, Srinidhi Akella

Abstract: Retrieval-Augmented Generation (RAG) has emerged as a promising technique to enhance the quality and relevance of responses generated by large language models. While recent advancements have mainly focused on improving RAG for text-based queries, RAG on multi-modal documents containing both texts and images has not been fully explored. Especially when fine-tuning does not work. This paper proposes… ▽ More Retrieval-Augmented Generation (RAG) has emerged as a promising technique to enhance the quality and relevance of responses generated by large language models. While recent advancements have mainly focused on improving RAG for text-based queries, RAG on multi-modal documents containing both texts and images has not been fully explored. Especially when fine-tuning does not work. This paper proposes BRIT, a novel multi-modal RAG framework that effectively unifies various text-image connections in the document into a multi-modal graph and retrieves the texts and images as a query-specific sub-graph. By traversing both image-to-text and text-to-image paths in the graph, BRIT retrieve not only directly query-relevant images and texts but also further relevant contents to answering complex cross-modal multi-hop questions. To evaluate the effectiveness of BRIT, we introduce MM-RAG test set specifically designed for multi-modal question answering tasks that require to understand the text-image relations. Our comprehensive experiments demonstrate the superiority of BRIT, highlighting its ability to handle cross-modal questions on the multi-modal documents. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.17421 [pdf, ps, other]

Adaptive Implicit-Based Deep Learning Channel Estimation for 6G Communications

Authors: Zhen Qiao, Jiang Xue, Junkai Zhang, Guanzhang Liu, Xiaoqin Ma, Runhua Li, Faheem A. Khan, John S. Thompson, Zongben Xu

Abstract: With the widespread deployment of fifth-generation (5G) wireless networks, research on sixth-generation (6G) technology is gaining momentum. Artificial Intelligence (AI) is anticipated to play a significant role in 6G, particularly through integration with the physical layer for tasks such as channel estimation. Considering resource limitations in real systems, the AI algorithm should be designed… ▽ More With the widespread deployment of fifth-generation (5G) wireless networks, research on sixth-generation (6G) technology is gaining momentum. Artificial Intelligence (AI) is anticipated to play a significant role in 6G, particularly through integration with the physical layer for tasks such as channel estimation. Considering resource limitations in real systems, the AI algorithm should be designed to have the ability to balance the accuracy and resource consumption according to the scenarios dynamically. However, conventional explicit multilayer-stacked Deep Learning (DL) models struggle to adapt due to their heavy reliance on the structure of deep neural networks. This article proposes an adaptive Implicit-layer DL Channel Estimation Network (ICENet) with a lightweight framework for vehicle-to-everything communications. This novel approach balances computational complexity and channel estimation accuracy by dynamically adjusting computational resources based on input data conditions, such as channel quality. Unlike explicit multilayer-stacked DL-based channel estimation models, ICENet offers a flexible framework, where specific requirements can be achieved by adaptively changing the number of iterations of the iterative layer. Meanwhile, ICENet requires less memory while maintaining high performance. The article concludes by highlighting open research challenges and promising future research directions. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2505.16477 [pdf]

Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery

Authors: Yanbo Zhang, Sumeer A. Khan, Adnan Mahmud, Huck Yang, Alexander Lavin, Michael Levin, Jeremy Frey, Jared Dunnmon, James Evans, Alan Bundy, Saso Dzeroski, Jesper Tegner, Hector Zenil

Abstract: With recent Nobel Prizes recognising AI contributions to science, Large Language Models (LLMs) are transforming scientific research by enhancing productivity and reshaping the scientific method. LLMs are now involved in experimental design, data analysis, and workflows, particularly in chemistry and biology. However, challenges such as hallucinations and reliability persist. In this contribution,… ▽ More With recent Nobel Prizes recognising AI contributions to science, Large Language Models (LLMs) are transforming scientific research by enhancing productivity and reshaping the scientific method. LLMs are now involved in experimental design, data analysis, and workflows, particularly in chemistry and biology. However, challenges such as hallucinations and reliability persist. In this contribution, we review how Large Language Models (LLMs) are redefining the scientific method and explore their potential applications across different stages of the scientific cycle, from hypothesis testing to discovery. We conclude that, for LLMs to serve as relevant and effective creative engines and productivity enhancers, their deep integration into all steps of the scientific process should be pursued in collaboration and alignment with human scientific goals, with clear evaluation metrics. The transition to AI-driven science raises ethical questions about creativity, oversight, and responsibility. With careful guidance, LLMs could evolve into creative engines, driving transformative breakthroughs across scientific disciplines responsibly and effectively. However, the scientific community must also decide how much it leaves to LLMs to drive science, even when associations with 'reasoning', mostly currently undeserved, are made in exchange for the potential to explore hypothesis and solution regions that might otherwise remain unexplored by human exploration alone. △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: 45 pages

Journal ref: npj Artificial Intelligence, 2025

arXiv:2505.15063 [pdf, ps, other]

UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking

Authors: Sarfraz Ahmad, Hasan Iqbal, Momina Ahsan, Numaan Naeem, Muhammad Ahsan Riaz Khan, Arham Riaz, Muhammad Arslan Manzoor, Yuxia Wang, Preslav Nakov

Abstract: The rapid use of large language models (LLMs) has raised critical concerns regarding the factual reliability of their outputs, especially in low-resource languages such as Urdu. Existing automated fact-checking solutions overwhelmingly focus on English, leaving a significant gap for the 200+ million Urdu speakers worldwide. In this work, we introduce UrduFactCheck, the first comprehensive, modular… ▽ More The rapid use of large language models (LLMs) has raised critical concerns regarding the factual reliability of their outputs, especially in low-resource languages such as Urdu. Existing automated fact-checking solutions overwhelmingly focus on English, leaving a significant gap for the 200+ million Urdu speakers worldwide. In this work, we introduce UrduFactCheck, the first comprehensive, modular fact-checking framework specifically tailored for Urdu. Our system features a dynamic, multi-strategy evidence retrieval pipeline that combines monolingual and translation-based approaches to address the scarcity of high-quality Urdu evidence. We curate and release two new hand-annotated benchmarks: UrduFactBench for claim verification and UrduFactQA for evaluating LLM factuality. Extensive experiments demonstrate that UrduFactCheck, particularly its translation-augmented variants, consistently outperforms baselines and open-source alternatives on multiple metrics. We further benchmark twelve state-of-the-art (SOTA) LLMs on factual question answering in Urdu, highlighting persistent gaps between proprietary and open-source models. UrduFactCheck's code and datasets are open-sourced and publicly available at https://github.com/mbzuai-nlp/UrduFactCheck. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: 16 pages, 10 figures, 4 tables, Submitted to ARR May 2025

ACM Class: I.2.7

arXiv:2505.13966 [pdf, other]

Waveform for Next Generation Communication Systems: Comparing Zak-OTFS with OFDM

Authors: Imran Ali Khan, Saif Khan Mohammed, Ronny Hadani, Ananthanarayanan Chockalingam, Robert Calderbank, Anton Monk, Shachar Kons, Shlomo Rakib, Yoav Hebron

Abstract: Across the world, there is growing interest in new waveforms, Zak-OTFS in particular, and over-the-air implementations are starting to appear. The choice between OFDM and Zak-OTFS is not so much a choice between waveforms as it is an architectural choice between preventing inter-carrier interference (ICI) and embracing ICI. In OFDM, once the Input-Output (I/O) relation is known, equalization is re… ▽ More Across the world, there is growing interest in new waveforms, Zak-OTFS in particular, and over-the-air implementations are starting to appear. The choice between OFDM and Zak-OTFS is not so much a choice between waveforms as it is an architectural choice between preventing inter-carrier interference (ICI) and embracing ICI. In OFDM, once the Input-Output (I/O) relation is known, equalization is relatively simple, at least when there is no ICI. However, in the presence of ICI the I/O relation is non-predictable and its acquisition is non-trivial. In contrast, equalization is more involved in Zak-OTFS due to inter-symbol-interference (ISI), however the I/O relation is predictable and its acquisition is simple. {Zak-OTFS exhibits superior performance in doubly-spread 6G use cases with high delay/Doppler channel spreads (i.e., high mobility and/or large cells), but architectural choice is governed by the typical use case, today and in the future. What is typical depends to some degree on geography, since large delay spread is a characteristic of large cells which are the rule rather than the exception in many important wireless markets.} This paper provides a comprehensive performance comparison of cyclic prefix OFDM (CP-OFDM) and Zak-OTFS across the full range of 6G propagation environments. The performance results provide insights into the fundamental architectural choice. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2505.11396 [pdf, ps, other]

doi 10.1145/3711896.3736960

Finding Counterfactual Evidences for Node Classification

Authors: Dazhuo Qiu, Jinwen Chen, Arijit Khan, Yan Zhao, Francesco Bonchi

Abstract: Counterfactual learning is emerging as an important paradigm, rooted in causality, which promises to alleviate common issues of graph neural networks (GNNs), such as fairness and interpretability. However, as in many real-world application domains where conducting randomized controlled trials is impractical, one has to rely on available observational (factual) data to detect counterfactuals. In th… ▽ More Counterfactual learning is emerging as an important paradigm, rooted in causality, which promises to alleviate common issues of graph neural networks (GNNs), such as fairness and interpretability. However, as in many real-world application domains where conducting randomized controlled trials is impractical, one has to rely on available observational (factual) data to detect counterfactuals. In this paper, we introduce and tackle the problem of searching for counterfactual evidences for the GNN-based node classification task. A counterfactual evidence is a pair of nodes such that, regardless they exhibit great similarity both in the features and in their neighborhood subgraph structures, they are classified differently by the GNN. We develop effective and efficient search algorithms and a novel indexing solution that leverages both node features and structural information to identify counterfactual evidences, and generalizes beyond any specific GNN. Through various downstream applications, we demonstrate the potential of counterfactual evidences to enhance fairness and accuracy of GNNs. △ Less

Submitted 2 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

Comments: Accepted by KDD 2025

arXiv:2505.10879 [pdf, ps, other]

Multi-Stage Speaker Diarization for Noisy Classrooms

Authors: Ali Sartaz Khan, Tolulope Ogunremi, Ahmed Adel Attia, Dorottya Demszky

Abstract: Speaker diarization, the process of identifying "who spoke when" in audio recordings, is essential for understanding classroom dynamics. However, classroom settings present distinct challenges, including poor recording quality, high levels of background noise, overlapping speech, and the difficulty of accurately capturing children's voices. This study investigates the effectiveness of multi-stage… ▽ More Speaker diarization, the process of identifying "who spoke when" in audio recordings, is essential for understanding classroom dynamics. However, classroom settings present distinct challenges, including poor recording quality, high levels of background noise, overlapping speech, and the difficulty of accurately capturing children's voices. This study investigates the effectiveness of multi-stage diarization models using Nvidia's NeMo diarization pipeline. We assess the impact of denoising on diarization accuracy and compare various voice activity detection (VAD) models, including self-supervised transformer-based frame-wise VAD models. We also explore a hybrid VAD approach that integrates Automatic Speech Recognition (ASR) word-level timestamps with frame-level VAD predictions. We conduct experiments using two datasets from English speaking classrooms to separate teacher vs. student speech and to separate all speakers. Our results show that denoising significantly improves the Diarization Error Rate (DER) by reducing the rate of missed speech. Additionally, training on both denoised and noisy datasets leads to substantial performance gains in noisy conditions. The hybrid VAD model leads to further improvements in speech detection, achieving a DER as low as 17% in teacher-student experiments and 45% in all-speaker experiments. However, we also identified trade-offs between voice activity detection and speaker confusion. Overall, our study highlights the effectiveness of multi-stage diarization models and integrating ASR-based information for enhancing speaker diarization in noisy classroom environments. △ Less

Submitted 27 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

arXiv:2505.10055 [pdf, ps, other]

PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language

Authors: Ijazul Haq, Yingjie Zhang, Irfan Ali Khan

Abstract: This paper evaluates the performance of Large Multimodal Models (LMMs) on Optical Character Recognition (OCR) in the low-resource Pashto language. Natural Language Processing (NLP) in Pashto faces several challenges due to the cursive nature of its script and a scarcity of structured datasets. To address this, we developed a synthetic Pashto OCR dataset, PsOCR, consisting of one million images ann… ▽ More This paper evaluates the performance of Large Multimodal Models (LMMs) on Optical Character Recognition (OCR) in the low-resource Pashto language. Natural Language Processing (NLP) in Pashto faces several challenges due to the cursive nature of its script and a scarcity of structured datasets. To address this, we developed a synthetic Pashto OCR dataset, PsOCR, consisting of one million images annotated with bounding boxes at word, line, and document levels, suitable for training and evaluating models based on different architectures, including Convolutional Neural Networks (CNNs) and Transformers. PsOCR covers variations across 1,000 unique font families, colors, image sizes, and layouts. A benchmark subset of 10K images was selected to evaluate the performance of several LMMs, including seven open-source models: DeepSeek's Janus, InternVL, MiniCPM, Florence, and Qwen (3B and 7B), and four closed-source models: GPT-4o, Gemini, Claude, and Grok. Experimental results demonstrate that Gemini achieves the best performance among all models, whereas among open-source models, Qwen-7B stands out. This work provides an insightful assessment of the capabilities and limitations of current LMMs for OCR tasks in Pashto and establishes a foundation for further research not only in Pashto OCR but also for other similar scripts such as Arabic, Persian, and Urdu. PsOCR is available at https://github.com/zirak-ai/PashtoOCR. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2505.09894 [pdf, ps, other]

Advancing Mobile UI Testing by Learning Screen Usage Semantics

Authors: Safwat Ali Khan

Abstract: The demand for quality in mobile applications has increased greatly given users' high reliance on them for daily tasks. Developers work tirelessly to ensure that their applications are both functional and user-friendly. In pursuit of this, Automated Input Generation (AIG) tools have emerged as a promising solution for testing mobile applications by simulating user interactions and exploring app fu… ▽ More The demand for quality in mobile applications has increased greatly given users' high reliance on them for daily tasks. Developers work tirelessly to ensure that their applications are both functional and user-friendly. In pursuit of this, Automated Input Generation (AIG) tools have emerged as a promising solution for testing mobile applications by simulating user interactions and exploring app functionalities. However, these tools face significant challenges in navigating complex Graphical User Interfaces (GUIs), and developers often have trouble understanding their output. More specifically, AIG tools face difficulties in navigating out of certain screens, such as login pages and advertisements, due to a lack of contextual understanding which leads to suboptimal testing coverage. Furthermore, while AIG tools can provide interaction traces consisting of action and screen details, there is limited understanding of its coverage of higher level functionalities, such as logging in, setting alarms, or saving notes. Understanding these covered use cases are essential to ensure comprehensive test coverage of app functionalities. Difficulty in testing mobile UIs can lead to the design of complex interfaces, which can adversely affect users of advanced age who often face usability barriers due to small buttons, cluttered layouts, and unintuitive navigation. There exists many studies that highlight these issues, but automated solutions for improving UI accessibility needs more attention. This research seeks to enhance automated UI testing techniques by learning the screen usage semantics of mobile apps and helping them navigate more efficiently, offer more insights about tested functionalities and also improve the usability of a mobile app's interface by identifying and mitigating UI design issues. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.07929 [pdf, ps, other]

Evidence that the Quantum Approximate Optimization Algorithm Optimizes the Sherrington-Kirkpatrick Model Efficiently in the Average Case

Authors: Sami Boulebnane, Abid Khan, Minzhao Liu, Jeffrey Larson, Dylan Herman, Ruslan Shaydulin, Marco Pistoia

Abstract: The Sherrington-Kirkpatrick (SK) model serves as a foundational framework for understanding disordered systems. The Quantum Approximate Optimization Algorithm (QAOA) is a quantum optimization algorithm whose performance monotonically improves with its depth $p$. We analyze QAOA applied to the SK model in the infinite-size limit and provide numerical evidence that it obtains a $(1-ε)$ approximation… ▽ More The Sherrington-Kirkpatrick (SK) model serves as a foundational framework for understanding disordered systems. The Quantum Approximate Optimization Algorithm (QAOA) is a quantum optimization algorithm whose performance monotonically improves with its depth $p$. We analyze QAOA applied to the SK model in the infinite-size limit and provide numerical evidence that it obtains a $(1-ε)$ approximation to the optimal energy with circuit depth $\mathcal{O}(n/ε^{1.13})$ in the average case. Our results are enabled by mapping the task of evaluating QAOA energy onto the task of simulating a spin-boson system, which we perform with modest cost using matrix product states. We optimize QAOA parameters and observe that QAOA achieves $\varepsilon\lesssim2.2\%$ at $p=160$ in the infinite-size limit. We then use these optimized QAOA parameters to evaluate the QAOA energy for finite-sized instances with up to $30$ qubits and find convergence to the ground state consistent with the infinite-size limit prediction. Our results provide strong numerical evidence that QAOA can efficiently approximate the ground state of the SK model in the average case. △ Less

Submitted 12 May, 2025; originally announced May 2025.

Comments: 17 pages, 5 figures

arXiv:2505.07635 [pdf, ps, other]

Interpreting Graph Inference with Skyline Explanations

Authors: Dazhuo Qiu, Haolai Che, Arijit Khan, Yinghui Wu

Abstract: Inference queries have been routinely issued to graph machine learning models such as graph neural networks (GNNs) for various network analytical tasks. Nevertheless, GNNs outputs are often hard to interpret comprehensively. Existing methods typically compromise to individual pre-defined explainability measures (such as fidelity), which often leads to biased, ``one-sided'' interpretations. This pa… ▽ More Inference queries have been routinely issued to graph machine learning models such as graph neural networks (GNNs) for various network analytical tasks. Nevertheless, GNNs outputs are often hard to interpret comprehensively. Existing methods typically compromise to individual pre-defined explainability measures (such as fidelity), which often leads to biased, ``one-sided'' interpretations. This paper introduces skyline explanation, a new paradigm that interprets GNN output by simultaneously optimizing multiple explainability measures of users' interests. (1) We propose skyline explanations as a Pareto set of explanatory subgraphs that dominate others over multiple explanatory measures. We formulate skyline explanation as a multi-criteria optimization problem, and establish its hardness results. (2) We design efficient algorithms with an onion-peeling approach, which strategically prioritizes nodes and removes unpromising edges to incrementally assemble skyline explanations. (3) We also develop an algorithm to diversify the skyline explanations to enrich the comprehensive interpretation. (4) We introduce efficient parallel algorithms with load-balancing strategies to scale skyline explanation for large-scale GNN-based inference. Using real-world and synthetic graphs, we experimentally verify our algorithms' effectiveness and scalability. △ Less

Submitted 3 July, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

arXiv:2505.07634 [pdf, ps, other]

Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

Authors: Jian Liu, Xiongtao Shi, Thai Duy Nguyen, Haitian Zhang, Tianxiang Zhang, Wei Sun, Yanjie Li, Athanasios V. Vasilakos, Giovanni Iacca, Arshad Ali Khan, Arvind Kumar, Jae Won Cho, Ajmal Mian, Lihua Xie, Erik Cambria, Lin Wang

Abstract: The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the ris… ▽ More The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the rise of embodied AI, where autonomous agents, such as humanoid robots, must navigate and manipulate unstructured environments with human-like adaptability. At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability. A Neural Brain must seamlessly integrate multimodal sensing and perception with cognitive capabilities. Achieving this also requires an adaptive memory system and energy-efficient hardware-software co-design, enabling real-time action in dynamic environments. This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges: (1) defining the core components of Neural Brain and (2) bridging the gap between static AI models and the dynamic adaptability required for real-world deployment. To this end, we propose a biologically inspired architecture that integrates multimodal active sensing, perception-cognition-action function, neuroplasticity-based memory storage and updating, and neuromorphic hardware/software optimization. Furthermore, we also review the latest research on embodied agents across these four aspects and analyze the gap between current AI systems and human intelligence. By synthesizing insights from neuroscience, we outline a roadmap towards the development of generalizable, autonomous agents capable of human-level intelligence in real-world scenarios. △ Less

Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

Comments: 51 pages, 17 figures, 9 tables

arXiv:2505.07435 [pdf, ps, other]

Predictions for Identified Hadron ($π^\pm$, $K^\pm$ and $p(\overline{p})$) Production and Collective Dynamics in Oxygen-Oxygen Collisions at $\sqrt{s_{NN}}$= 7 TeV with EPOS4, AMPT-SM, and Angantyr in Pythia 8

Authors: Rabia Bashir, Ramoona Shehzadi, M. U. Ashraf, A. M. Khan

Abstract: We study the dynamics of identified hadrons ($π^\pm$, $K^\pm$ and $p(\overline{p})$) production in $O+O$ collisions at $\sqrt{s_{\mathrm{NN}}} = 7$TeV using recently updated version of EPOS4, string melting version of A Multi-Phase Transport Model (AMPT-SM) and Angantyr model, incorporated within Pythia 8. We examine the interplay between different mechanisms implemented in these models. Predictio… ▽ More We study the dynamics of identified hadrons ($π^\pm$, $K^\pm$ and $p(\overline{p})$) production in $O+O$ collisions at $\sqrt{s_{\mathrm{NN}}} = 7$TeV using recently updated version of EPOS4, string melting version of A Multi-Phase Transport Model (AMPT-SM) and Angantyr model, incorporated within Pythia 8. We examine the interplay between different mechanisms implemented in these models. Predictions for charged particle multiplicity ($dN_{ch}/dη$), transverse momentum ($p_T$) spectra of identified hadrons, particle yield ($dN/dy$) and mean transverse mass ($\langle m_T \rangle$) are presented. To probe the collective behavior of the produced particles, the $p_T$-differential kaons-to-pion and proton-to-pion ratios are studied. While AMPT incorporates some flow effects, EPOS4's implementation of full hydrodynamic flow proves significantly more effective. In contrast, the flow effects in Pythia 8 are substantially weaker compared to the other models. The upcoming $O+O$ data from the LHC will help constrain the parameters of these models. △ Less

Submitted 12 May, 2025; originally announced May 2025.

Comments: 10 pages, 5 figures,

arXiv:2505.06674 [pdf, ps, other]

doi 10.1088/0031-8949/91/5/055301

Temperature-dependent nuclear partition functions and abundances in stellar interior

Authors: Jameel-Un Nabi, Abdel Nasser Tawfik, Nada Ezzelarab, Ali Abas Khan

Abstract: We calculate temperature-dependent nuclear partition functions (TDNPFs) and nuclear abundances for $728$ nuclei assuming nuclear statistical equilibrium (NSE). The theories of stellar evolution support NSE. Discrete nuclear energy levels have been calculated \textit{microscopically}, using the pn-QRPA theory, up to an excitation energy of $10$ MeV in the calculation of TDNPFs. This feature of our… ▽ More We calculate temperature-dependent nuclear partition functions (TDNPFs) and nuclear abundances for $728$ nuclei assuming nuclear statistical equilibrium (NSE). The theories of stellar evolution support NSE. Discrete nuclear energy levels have been calculated \textit{microscopically}, using the pn-QRPA theory, up to an excitation energy of $10$ MeV in the calculation of TDNPFs. This feature of our paper distinguishes it from previous calculations. Experimental data is also incorporated wherever available to ensure reliability of our results. Beyond 10 MeV we employ simple Fermi gas model and perform integration over the nuclear level densities to approximate the TDNPFs. We calculate nuclidic abundances, using the Saha equation, as a function of three parameters: stellar density, stellar temperature and lepton-to-baryon content of stellar matter. All these physical parameters are considered to be extremely important in stellar interior. Results obtained in this paper show that the equilibrium configuration of nuclei remains unaltered by increasing stellar density (only calculated nuclear abundances increases by roughly same order of magnitude). Increasing the stellar temperature smooths the equilibrium configuration showing peaks at neutron-number magic nuclei. △ Less

Submitted 10 May, 2025; originally announced May 2025.

Comments: 42 Page, 21 Table, 8 Figures

Journal ref: Physica Scripta, 91(5), 055301 (2016)

arXiv:2505.06229 [pdf, ps, other]

Neural Network Operator-Based Fractal Approximation: Smoothness Preservation and Convergence Analysis

Authors: Aaqib Ayoub Bhat, Asif Khan, M. Mursaleen

Abstract: This paper presents a new approach of constructing $α$-fractal interpolation functions (FIFs) using neural network operators, integrating concepts from approximation theory. Initially, we construct $α$-fractals utilizing neural network-based operators, providing an approach to generating fractal functions with interpolation properties. Based on the same foundation, we have developed fractal interp… ▽ More This paper presents a new approach of constructing $α$-fractal interpolation functions (FIFs) using neural network operators, integrating concepts from approximation theory. Initially, we construct $α$-fractals utilizing neural network-based operators, providing an approach to generating fractal functions with interpolation properties. Based on the same foundation, we have developed fractal interpolation functions that utilize only the values of the original function at the nodes or partition points, unlike traditional methods that rely on the entire original function. Further, we have constructed $α$-fractals that preserve the smoothness of functions under certain constraints by employing a four-layered neural network operator, ensuring that if $f \in C^{r}[a,b]$, then the corresponding fractal $f^α \in C^{r}[a,b]$. Furthermore, we analyze the convergence of these $α$-fractals to the original function under suitable conditions. The work uses key approximation theory tools, such as the modulus of continuity and interpolation operators, to develop convergence results and uniform approximation error bounds. △ Less

Submitted 22 March, 2025; originally announced May 2025.

Comments: 18 pages

MSC Class: 28A80; 41A05; 41A25; 41A29; 41A30; 65D05

arXiv:2505.06128 [pdf, other]

Above-room-temperature ferromagnetism in large-area epitaxial Fe3GaTe2/graphene van der Waals heterostructures

Authors: Tauqir Shinwari, Kacho Imtiyaz Ali Khan, Hua Lv, Atekelte Abebe Kassa, Frans Munnik, Simon Josephy, Achim Trampert, Victor Ukleev, Chen Luo, Florin Radu, Jens Herfort, Michael Hanke, Joao Marcelo Jordao Lopes

Abstract: Fe3GaTe2 (FGaT), a two-dimensional (2D) layered ferromagnetic metal, exhibits a high Curie temperature (TC) ~ 360 K along with strong perpendicular magnetic anisotropy (PMA), making it a promising material candidate for next-generation energy-efficient magnetic devices. However, the vast majority of studies on FGaT to date have been limited to millimeter-sized bulk crystals and exfoliated flakes,… ▽ More Fe3GaTe2 (FGaT), a two-dimensional (2D) layered ferromagnetic metal, exhibits a high Curie temperature (TC) ~ 360 K along with strong perpendicular magnetic anisotropy (PMA), making it a promising material candidate for next-generation energy-efficient magnetic devices. However, the vast majority of studies on FGaT to date have been limited to millimeter-sized bulk crystals and exfoliated flakes, which are unsuitable for practical applications and integration into device processing. Also, its combination with other 2D materials to form van der Waals heterostructures has only been achieved by flake stacking. Consequently, the controlled large-scale growth of FGaT and related heterostructures remains largely unexplored. In this work, we demonstrate a breakthrough in the high-quality, large-scale growth of epitaxial FGaT thin films on single-crystalline graphene/SiC templates using molecular beam epitaxy. Structural characterization confirms the high crystalline quality of the continuous FGaT/graphene van der Waals heterostructures. Temperature-dependent magnetization and anomalous Hall measurements reveal robust PMA with an enhanced TC well above room temperature, reaching up to 400 K. Furthermore, X-ray absorption and X-ray magnetic circular dichroism spectra provide insight into the spin and orbital magnetic moment contributions, further validating the high TC and robust PMA. These findings are highly significant for the future development of high-performance spintronic devices based on 2D heterostructures, with potential applications in next-generation data storage, logic processing and quantum technologies. △ Less

Submitted 9 May, 2025; originally announced May 2025.

arXiv:2505.04318 [pdf, other]

Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing

Authors: Jacob Glenn Ayers, Buvaneswari A. Ramanan, Manzoor A. Khan

Abstract: As the adoption of deep learning models has grown beyond human capacity for verification, meta-algorithms are needed to ensure reliable model inference. Concept drift detection is a field dedicated to identifying statistical shifts that is underutilized in monitoring neural networks that may encounter inference data with distributional characteristics diverging from their training data. Given the… ▽ More As the adoption of deep learning models has grown beyond human capacity for verification, meta-algorithms are needed to ensure reliable model inference. Concept drift detection is a field dedicated to identifying statistical shifts that is underutilized in monitoring neural networks that may encounter inference data with distributional characteristics diverging from their training data. Given the wide variety of model architectures, applications, and datasets, it is important that concept drift detection algorithms are adaptable to different inference scenarios. In this paper, we introduce an application of the $χ^2$ Goodness of Fit Hypothesis Test as a drift detection meta-algorithm applied to a multilayer perceptron, a convolutional neural network, and a transformer trained for machine vision as they are exposed to simulated drift during inference. To that end, we demonstrate how unexpected drops in accuracy due to concept drift can be detected without directly examining the inference outputs. Our approach enhances safety by ensuring models are continually evaluated for reliability across varying conditions. △ Less

Submitted 7 May, 2025; originally announced May 2025.

Comments: 8 pages, 6 figures, 1 table

arXiv:2505.03931 [pdf, other]

NMPC-Lander: Nonlinear MPC with Barrier Function for UAV Landing on a Mobile Platform

Authors: Amber Batool, Faryal Batool, Roohan Ahmed Khan, Muhammad Ahsan Mustafa, Aleksey Fedoseev, Dzmitry Tsetserukou

Abstract: Quadcopters are versatile aerial robots gaining popularity in numerous critical applications. However, their operational effectiveness is constrained by limited battery life and restricted flight range. To address these challenges, autonomous drone landing on stationary or mobile charging and battery-swapping stations has become an essential capability. In this study, we present NMPC-Lander, a nov… ▽ More Quadcopters are versatile aerial robots gaining popularity in numerous critical applications. However, their operational effectiveness is constrained by limited battery life and restricted flight range. To address these challenges, autonomous drone landing on stationary or mobile charging and battery-swapping stations has become an essential capability. In this study, we present NMPC-Lander, a novel control architecture that integrates Nonlinear Model Predictive Control (NMPC) with Control Barrier Functions (CBF) to achieve precise and safe autonomous landing on both static and dynamic platforms. Our approach employs NMPC for accurate trajectory tracking and landing, while simultaneously incorporating CBF to ensure collision avoidance with static obstacles. Experimental evaluations on the real hardware demonstrate high precision in landing scenarios, with an average final position error of 9.0 cm and 11 cm for stationary and mobile platforms, respectively. Notably, NMPC-Lander outperforms the B-spline combined with the A* planning method by nearly threefold in terms of position tracking, underscoring its superior robustness and practical effectiveness. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: This manuscript has been submitted to the IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2025

arXiv:2505.03787 [pdf, other]

ArrhythmiaVision: Resource-Conscious Deep Learning Models with Visual Explanations for ECG Arrhythmia Classification

Authors: Zuraiz Baig, Sidra Nasir, Rizwan Ahmed Khan, Muhammad Zeeshan Ul Haque

Abstract: Cardiac arrhythmias are a leading cause of life-threatening cardiac events, highlighting the urgent need for accurate and timely detection. Electrocardiography (ECG) remains the clinical gold standard for arrhythmia diagnosis; however, manual interpretation is time-consuming, dependent on clinical expertise, and prone to human error. Although deep learning has advanced automated ECG analysis, many… ▽ More Cardiac arrhythmias are a leading cause of life-threatening cardiac events, highlighting the urgent need for accurate and timely detection. Electrocardiography (ECG) remains the clinical gold standard for arrhythmia diagnosis; however, manual interpretation is time-consuming, dependent on clinical expertise, and prone to human error. Although deep learning has advanced automated ECG analysis, many existing models abstract away the signal's intrinsic temporal and morphological features, lack interpretability, and are computationally intensive-hindering their deployment on resource-constrained platforms. In this work, we propose two novel lightweight 1D convolutional neural networks, ArrhythmiNet V1 and V2, optimized for efficient, real-time arrhythmia classification on edge devices. Inspired by MobileNet's depthwise separable convolutional design, these models maintain memory footprints of just 302.18 KB and 157.76 KB, respectively, while achieving classification accuracies of 0.99 (V1) and 0.98 (V2) on the MIT-BIH Arrhythmia Dataset across five classes: Normal Sinus Rhythm, Left Bundle Branch Block, Right Bundle Branch Block, Atrial Premature Contraction, and Premature Ventricular Contraction. In order to ensure clinical transparency and relevance, we integrate Shapley Additive Explanations and Gradient-weighted Class Activation Mapping, enabling both local and global interpretability. These techniques highlight physiologically meaningful patterns such as the QRS complex and T-wave that contribute to the model's predictions. We also discuss performance-efficiency trade-offs and address current limitations related to dataset diversity and generalizability. Overall, our findings demonstrate the feasibility of combining interpretability, predictive accuracy, and computational efficiency in practical, wearable, and embedded ECG monitoring systems. △ Less

Submitted 30 April, 2025; originally announced May 2025.

Comments: 14 pages and 08 figures

arXiv:2505.03406 [pdf, other]

Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation

Authors: Mohammad Shoaib Ansari, Mohd Sohail Ali Khan, Shubham Revankar, Aditya Varma, Anil S. Mokhade

Abstract: This research paper investigates the application of Large Language Models (LLMs) in healthcare, specifically focusing on enhancing medical decision support through Retrieval-Augmented Generation (RAG) integrated with hospital-specific data and fine-tuning using Quantized Low-Rank Adaptation (QLoRA). The system utilizes Llama 3.2-3B-Instruct as its foundation model. By embedding and retrieving cont… ▽ More This research paper investigates the application of Large Language Models (LLMs) in healthcare, specifically focusing on enhancing medical decision support through Retrieval-Augmented Generation (RAG) integrated with hospital-specific data and fine-tuning using Quantized Low-Rank Adaptation (QLoRA). The system utilizes Llama 3.2-3B-Instruct as its foundation model. By embedding and retrieving context-relevant healthcare information, the system significantly improves response accuracy. QLoRA facilitates notable parameter efficiency and memory optimization, preserving the integrity of medical information through specialized quantization techniques. Our research also shows that our model performs relatively well on various medical benchmarks, indicating that it can be used to make basic medical suggestions. This paper details the system's technical components, including its architecture, quantization methods, and key healthcare applications such as enhanced disease prediction from patient symptoms and medical history, treatment suggestions, and efficient summarization of complex medical reports. We touch on the ethical considerations-patient privacy, data security, and the need for rigorous clinical validation-as well as the practical challenges of integrating such systems into real-world healthcare workflows. Furthermore, the lightweight quantized weights ensure scalability and ease of deployment even in low-resource hospital environments. Finally, the paper concludes with an analysis of the broader impact of LLMs on healthcare and outlines future directions for LLMs in medical settings. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 12 pages

arXiv:2505.03267 [pdf, other]

Solar Coronal Heating: Role of Kinetic and Inertial Alfvén Waves in Heating and Charged Particle Acceleration

Authors: Syed Ayaz, Gary P. Zank, Imran A. Khan, Yeimy J. Rivera, Andreas Shalchi, L. -L. Zhao

Abstract: A comprehensive understanding of solar coronal heating and charged particle acceleration remains one of the most critical challenges in space and astrophysical plasma physics. In this study, we explore the contribution of Alfvén waves, both in their kinetic (KAWs) and inertial (IAWs) regimes, to particle acceleration processes that ultimately lead to coronal heating. Using a kinetic plasma framewo… ▽ More A comprehensive understanding of solar coronal heating and charged particle acceleration remains one of the most critical challenges in space and astrophysical plasma physics. In this study, we explore the contribution of Alfvén waves, both in their kinetic (KAWs) and inertial (IAWs) regimes, to particle acceleration processes that ultimately lead to coronal heating. Using a kinetic plasma framework based on the generalized Vlasov-Maxwell model, we analyze the dynamics of these waves with a focus on the perpendicular components of the Poynting flux vectors and the net resonance speed of the particles. Our results show that both the magnitude and dissipation rate of the Poynting flux for KAWs and IAWs decrease with increasing electron-to-ion temperature ratio (T_e/T_i) and normalized perpendicular electron inertial length (c k_x / omega_pe). We evaluate the associated electric potentials and find that KAWs are significantly influenced in the high wavenumber (k_x rho_i) regime. IAWs, on the other hand, show a decrease in electric potential along the magnetic field and an increase across it when the perpendicular electric field (E_x) is enhanced. We also determine the net resonant speeds of particles in the perpendicular direction and show that these wave-particle interactions can efficiently heat the solar corona over large distances (R_Sun). Finally, we quantify the power transported by KAWs and IAWs through solar flux loop tubes, finding that both wave types deliver greater energy with increasing T_e/T_i and c k_x / omega_pe. These findings offer deeper insights into wave-driven heating and are relevant to solar wind and magnetospheric physics. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: Submitted to the Monthly Notices of the Royal Astronomical Society (MNRAS)

arXiv:2505.02946 [pdf, ps, other]

A variational multiscale approach to goal-oriented error estimation in finite element analysis of convection-diffusion-reaction equation problems

Authors: Sheraz Ahmed Khan, Ramon Codina, Hauke Gravenkamp

Abstract: This paper presents a goal-oriented a posteriori error estimation framework for linear functionals in the stabilized finite element discretization of the stationary convection-diffusion-reaction (CDR) equation. The theoretical framework for error estimation is based on the variational multiscale (VMS) concept, where the solution is decomposed into resolved (finite element) and unresolved (sub-grid… ▽ More This paper presents a goal-oriented a posteriori error estimation framework for linear functionals in the stabilized finite element discretization of the stationary convection-diffusion-reaction (CDR) equation. The theoretical framework for error estimation is based on the variational multiscale (VMS) concept, where the solution is decomposed into resolved (finite element) and unresolved (sub-grid) scales. In this work, we propose an orthogonal sub-grid scale (OSGS) method for a goal-oriented error estimation in VMS discretizations. In the OSGS approach, the space of the sub-grid scales (SGSs) is orthogonal to the finite element space. The error is estimated in the quantity of interest, given by the linear functional $Q(u)$ of the unknown $u$. If the SGS $u'$ is estimated, the error in the quantity of interest can be approximated by $Q(u')$. Our approach is compared with a duality-based a posteriori error estimation method, which requires the solution of an additional auxiliary problem. The results indicate that both methods yield similar error estimates, whereas the VMS-based explicit approach is computationally less expensive than the duality-based implicit approach. Numerical tests demonstrated the effectiveness of our proposed error estimation techniques in terms of the quantity of interest functionals. △ Less

Submitted 5 May, 2025; originally announced May 2025.

arXiv:2505.02531 [pdf, ps, other]

A posteriori error estimates for the finite element approximation of the convection-diffusion-reaction equation based on the variational multiscale concept

Authors: Ramon Codina, Hauke Gravenkamp, Sheraz Ahmed Khan

Abstract: In this study, we employ the variational multiscale (VMS) concept to develop a posteriori error estimates for the stationary convection-diffusion-reaction equation. The variational multiscale method is based on splitting the continuous part of the problem into a resolved scale (coarse scale) and an unresolved scale (fine scale). The unresolved scale (also known as the sub-grid scale) is modeled by… ▽ More In this study, we employ the variational multiscale (VMS) concept to develop a posteriori error estimates for the stationary convection-diffusion-reaction equation. The variational multiscale method is based on splitting the continuous part of the problem into a resolved scale (coarse scale) and an unresolved scale (fine scale). The unresolved scale (also known as the sub-grid scale) is modeled by choosing it proportional to the component of the residual orthogonal to the finite element space, leading to the orthogonal sub-grid scale (OSGS) method. The idea is then to use the modeled sub-grid scale as an error estimator, considering its contribution in the element interiors and on the edges. We present the results of the a priori analysis and two different strategies for the a posteriori error analysis for the OSGS method. Our proposal is to use a scaled norm of the sub-grid scales as an a posteriori error estimate in the so-called stabilized norm of the problem. This norm has control over the convective term, which is necessary for convection-dominated problems. Numerical examples show the reliable performance of the proposed error estimator compared to other error estimators belonging to the variational multiscale family. △ Less

Submitted 5 May, 2025; originally announced May 2025.

arXiv:2505.01863 [pdf, other]

Quantum Energy Teleportation across Multi-Qubit Systems using W-State Entanglement

Authors: Alif Elham Khan, Humayra Anjum, Mahdy Rahman Chowdhury

Abstract: Quantum-energy teleportation (QET) has so far only been realised on a two-qubit platform. Real-world communication, however, typically involves multiple parties. Here we design and experimentally demonstrate the first multi-qubit QET protocol using a robust W-state multipartite entanglement. Three-, four- and five-qubit circuits were executed both on noiseless simulators and on IBM superconducting… ▽ More Quantum-energy teleportation (QET) has so far only been realised on a two-qubit platform. Real-world communication, however, typically involves multiple parties. Here we design and experimentally demonstrate the first multi-qubit QET protocol using a robust W-state multipartite entanglement. Three-, four- and five-qubit circuits were executed both on noiseless simulators and on IBM superconducting hardware. In every case a single sender injects an energy E0 that is then deterministically and decrementally harvested by several remote receivers, confirming that energy introduced at one node can be redistributed among many entangled subsystems at light-speed-limited classical latency. Our results open a practical route toward energy-aware quantum networks. △ Less

Submitted 3 May, 2025; originally announced May 2025.

arXiv:2505.01435 [pdf, other]

AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine

Authors: Carlo Siebenschuh, Kyle Hippe, Ozan Gokdemir, Alexander Brace, Arham Khan, Khalid Hossain, Yadu Babuji, Nicholas Chia, Venkatram Vishwanath, Rick Stevens, Arvind Ramanathan, Ian Foster, Robert Underwood

Abstract: Language models for scientific tasks are trained on text from scientific publications, most distributed as PDFs that require parsing. PDF parsing approaches range from inexpensive heuristics (for simple documents) to computationally intensive ML-driven systems (for complex or degraded ones). The choice of the "best" parser for a particular document depends on its computational cost and the accurac… ▽ More Language models for scientific tasks are trained on text from scientific publications, most distributed as PDFs that require parsing. PDF parsing approaches range from inexpensive heuristics (for simple documents) to computationally intensive ML-driven systems (for complex or degraded ones). The choice of the "best" parser for a particular document depends on its computational cost and the accuracy of its output. To address these issues, we introduce an Adaptive Parallel PDF Parsing and Resource Scaling Engine (AdaParse), a data-driven strategy for assigning an appropriate parser to each document. We enlist scientists to select preferred parser outputs and incorporate this information through direct preference optimization (DPO) into AdaParse, thereby aligning its selection process with human judgment. AdaParse then incorporates hardware requirements and predicted accuracy of each parser to orchestrate computational resources efficiently for large-scale parsing campaigns. We demonstrate that AdaParse, when compared to state-of-the-art parsers, improves throughput by $17\times$ while still achieving comparable accuracy (0.2 percent better) on a benchmark set of 1000 scientific documents. AdaParse's combination of high accuracy and parallel scalability makes it feasible to parse large-scale scientific document corpora to support the development of high-quality, trillion-token-scale text datasets. The implementation is available at https://github.com/7shoe/AdaParse/ △ Less

Submitted 23 April, 2025; originally announced May 2025.

Comments: This paper has been accepted at the The Eighth Annual Conference on Machine Learning and Systems (MLSys 2025)

arXiv:2504.21831 [pdf, other]

Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization

Authors: Anas Anwarul Haq Khan, Utkarsh Verma, Prateek Chanda, Ganesh Ramakrishnan

Abstract: We introduce DEEVISum (Distilled Early Exit Vision language model for Summarization), a lightweight, efficient, and scalable vision language model designed for segment wise video summarization. Leveraging multi modal prompts that combine textual and audio derived signals, DEEVISum incorporates Multi Stage Knowledge Distillation (MSKD) and Early Exit (EE) to strike a balance between performance and… ▽ More We introduce DEEVISum (Distilled Early Exit Vision language model for Summarization), a lightweight, efficient, and scalable vision language model designed for segment wise video summarization. Leveraging multi modal prompts that combine textual and audio derived signals, DEEVISum incorporates Multi Stage Knowledge Distillation (MSKD) and Early Exit (EE) to strike a balance between performance and efficiency. MSKD offers a 1.33% absolute F1 improvement over baseline distillation (0.5%), while EE reduces inference time by approximately 21% with a 1.3 point drop in F1. Evaluated on the TVSum dataset, our best model PaLI Gemma2 3B + MSKD achieves an F1 score of 61.1, competing the performance of significantly larger models, all while maintaining a lower computational footprint. We publicly release our code and processed dataset to support further research. △ Less

Submitted 30 April, 2025; originally announced April 2025.

arXiv:2504.21745 [pdf, other]

Exponential advantage in quantum sensing of correlated parameters

Authors: Sridhar Prabhu, Vladimir Kremenetski, Saeed A. Khan, Ryotatsu Yanagimoto, Peter L. McMahon

Abstract: Conventionally in quantum sensing, the goal is to estimate one or more unknown parameters that are assumed to be deterministic - that is, they do not change between shots of the quantum-sensing protocol. We instead consider the setting where the parameters are stochastic: each shot of the quantum-sensing protocol senses parameter values that come from independent random draws. In this work, we exp… ▽ More Conventionally in quantum sensing, the goal is to estimate one or more unknown parameters that are assumed to be deterministic - that is, they do not change between shots of the quantum-sensing protocol. We instead consider the setting where the parameters are stochastic: each shot of the quantum-sensing protocol senses parameter values that come from independent random draws. In this work, we explore three examples where the stochastic parameters are correlated and show how using entanglement provides a benefit in classification or estimation tasks: (1) a two-parameter classification task, for which there is an advantage in the low-shot regime; (2) an $N$-parameter estimation task and a classification variant of it, for which an entangled sensor requires just a constant number (independent of $N$) shots to achieve the same accuracy as an unentangled sensor using exponentially many (${\sim}2^N$) shots; (3) classifying the magnetization of a spin chain in thermal equilibrium, where the individual spins fluctuate but the total spin in one direction is conserved - this gives a practical setting in which stochastic parameters are correlated in a way that an entangled sensor can be designed to exploit. We also present a theoretical framework for assessing, for a given choice of entangled sensing protocol and distributions to discriminate between, how much advantage the entangled sensor would have over an unentangled sensor. Our work motivates the further study of sensing correlated stochastic parameters using entangled quantum sensors - and since classical sensors by definition cannot be entangled, our work shows the possibility for entangled quantum sensors to achieve an exponential advantage over classical sensors, in contrast to the typical quadratic advantage. △ Less

Submitted 30 April, 2025; originally announced April 2025.

arXiv:2504.20235 [pdf, other]

Dynamic output-based feedback stabilizability for linear parabolic equations with memory

Authors: Arbaz Khan, Sumit Mahajan, Sérgio S. Rodrigues

Abstract: The stabilizability of a general class of linear parabolic equations with a memory term, is achieve by explicit output feedback. The control input is given as a function of a state-estimate provided by an exponential dynamic Luenberger observer based on the output of sensor measurements. The numbers of actuators and sensors are finite. The feedback input and output injection operators are given ex… ▽ More The stabilizability of a general class of linear parabolic equations with a memory term, is achieve by explicit output feedback. The control input is given as a function of a state-estimate provided by an exponential dynamic Luenberger observer based on the output of sensor measurements. The numbers of actuators and sensors are finite. The feedback input and output injection operators are given explicitly involving appropriate orthogonal projections. For exponential kernels, exponential stabilizability can be achieved with the rate of the exponential kernel. The discretization and simulation of the controlled systems are addressed as well and results of simulations are reported showing the performance of the proposed dynamic output-based control feedback input. We include simulations for both exponential and weakly singular Riesz kernels, showing the success of the strategy in obtaining a stabilizing input. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: 15 figures

arXiv:2504.19461 [pdf]

The Role of Generative AI in Strengthening Secure Software Coding Practices: A Systematic Perspective

Authors: Hathal S. Alwageed, Rafiq Ahmad Khan

Abstract: As software security threats continue to evolve, the demand for innovative ways of securing coding has tremendously grown. The integration of Generative AI (GenAI) into software development holds significant potential for improving secure coding practices. This paper aims at systematically studying the impact of GenAI in enhancing secure coding practices from improving software security, setting f… ▽ More As software security threats continue to evolve, the demand for innovative ways of securing coding has tremendously grown. The integration of Generative AI (GenAI) into software development holds significant potential for improving secure coding practices. This paper aims at systematically studying the impact of GenAI in enhancing secure coding practices from improving software security, setting forth its potential benefits, challenges, and implications. To outline the contribution of AI driven code generation tools, we analyze via a structured review of recent literature, application to the industry, and empirical studies on how these tools help to mitigate security risks, comply with the secure coding standards, and make software development efficient. We hope that our findings will benefit researchers, software engineers and cybersecurity professionals alike in integrating GenAI into a secure development workflow without losing the advantages GenAI provides. Finally, the state of the art advances and future directions of AI assisted in secure software engineering discussed in this study can contribute to the ongoing discourse on AI assisted in secure software engineering. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: 1-6 pages

arXiv:2504.19271 [pdf, other]

Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection

Authors: Athul M. Mathew, Arshad Ali Khan, Thariq Khalid, Faroq AL-Tam, Riad Souissi

Abstract: Gaze target detection (GTD) is the task of predicting where a person in an image is looking. This is a challenging task, as it requires the ability to understand the relationship between the person's head, body, and eyes, as well as the surrounding environment. In this paper, we propose a novel method for GTD that fuses multiple pieces of information extracted from an image. First, we project the… ▽ More Gaze target detection (GTD) is the task of predicting where a person in an image is looking. This is a challenging task, as it requires the ability to understand the relationship between the person's head, body, and eyes, as well as the surrounding environment. In this paper, we propose a novel method for GTD that fuses multiple pieces of information extracted from an image. First, we project the 2D image into a 3D representation using monocular depth estimation. We then extract a depth-infused saliency module map, which highlights the most salient (\textit{attention-grabbing}) regions in image for the subject in consideration. We also extract face and depth modalities from the image, and finally fuse all the extracted modalities to identify the gaze target. We quantitatively evaluated our method, including the ablation analysis on three publicly available datasets, namely VideoAttentionTarget, GazeFollow and GOO-Real, and showed that it outperforms other state-of-the-art methods. This suggests that our method is a promising new approach for GTD. △ Less

Submitted 27 April, 2025; originally announced April 2025.

Comments: accepted at NeurIPS 2023 Gaze Meets ML Workshop

arXiv:2504.18856 [pdf, other]

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation

Authors: Shahad Albastaki, Anabia Sohail, Iyyakutti Iyappan Ganapathi, Basit Alawode, Asim Khan, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, Arif Mahmood

Abstract: In Computational Pathology (CPath), the introduction of Vision-Language Models (VLMs) has opened new avenues for research, focusing primarily on aligning image-text pairs at a single magnification level. However, this approach might not be sufficient for tasks like cancer subtype classification, tissue phenotyping, and survival analysis due to the limited level of detail that a single-resolution i… ▽ More In Computational Pathology (CPath), the introduction of Vision-Language Models (VLMs) has opened new avenues for research, focusing primarily on aligning image-text pairs at a single magnification level. However, this approach might not be sufficient for tasks like cancer subtype classification, tissue phenotyping, and survival analysis due to the limited level of detail that a single-resolution image can provide. Addressing this, we propose a novel multi-resolution paradigm leveraging Whole Slide Images (WSIs) to extract histology patches at multiple resolutions and generate corresponding textual descriptions through advanced CPath VLM. We introduce visual-textual alignment at multiple resolutions as well as cross-resolution alignment to establish more effective text-guided visual representations. Cross-resolution alignment using a multimodal encoder enhances the model's ability to capture context from multiple resolutions in histology images. Our model aims to capture a broader range of information, supported by novel loss functions, enriches feature representation, improves discriminative ability, and enhances generalization across different resolutions. Pre-trained on a comprehensive TCGA dataset with 34 million image-language pairs at various resolutions, our fine-tuned model outperforms state-of-the-art (SOTA) counterparts across multiple datasets and tasks, demonstrating its effectiveness in CPath. The code is available on GitHub at: https://github.com/BasitAlawode/MR-PLIP △ Less

Submitted 26 April, 2025; originally announced April 2025.

arXiv:2504.15995 [pdf, other]

OPUS-VFL: Incentivizing Optimal Privacy-Utility Tradeoffs in Vertical Federated Learning

Authors: Sindhuja Madabushi, Ahmad Faraz Khan, Haider Ali, Jin-Hee Cho

Abstract: Vertical Federated Learning (VFL) enables organizations with disjoint feature spaces but shared user bases to collaboratively train models without sharing raw data. However, existing VFL systems face critical limitations: they often lack effective incentive mechanisms, struggle to balance privacy-utility tradeoffs, and fail to accommodate clients with heterogeneous resource capabilities. These cha… ▽ More Vertical Federated Learning (VFL) enables organizations with disjoint feature spaces but shared user bases to collaboratively train models without sharing raw data. However, existing VFL systems face critical limitations: they often lack effective incentive mechanisms, struggle to balance privacy-utility tradeoffs, and fail to accommodate clients with heterogeneous resource capabilities. These challenges hinder meaningful participation, degrade model performance, and limit practical deployment. To address these issues, we propose OPUS-VFL, an Optimal Privacy-Utility tradeoff Strategy for VFL. OPUS-VFL introduces a novel, privacy-aware incentive mechanism that rewards clients based on a principled combination of model contribution, privacy preservation, and resource investment. It employs a lightweight leave-one-out (LOO) strategy to quantify feature importance per client, and integrates an adaptive differential privacy mechanism that enables clients to dynamically calibrate noise levels to optimize their individual utility. Our framework is designed to be scalable, budget-balanced, and robust to inference and poisoning attacks. Extensive experiments on benchmark datasets (MNIST, CIFAR-10, and CIFAR-100) demonstrate that OPUS-VFL significantly outperforms state-of-the-art VFL baselines in both efficiency and robustness. It reduces label inference attack success rates by up to 20%, increases feature inference reconstruction error (MSE) by over 30%, and achieves up to 25% higher incentives for clients that contribute meaningfully while respecting privacy and cost constraints. These results highlight the practicality and innovation of OPUS-VFL as a secure, fair, and performance-driven solution for real-world VFL. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2504.13534 [pdf, other]

CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models

Authors: Feiyang Li, Peng Fang, Zhan Shi, Arijit Khan, Fang Wang, Dan Feng, Weihao Wang, Xin Zhang, Yongjian Cui

Abstract: Chain-of-thought (CoT) reasoning boosts large language models' (LLMs) performance on complex tasks but faces two key limitations: a lack of reliability when solely relying on LLM-generated reasoning chains and interference from natural language reasoning steps with the models' inference process, also known as the inference logic of LLMs. To address these issues, we propose CoT-RAG, a novel reasoni… ▽ More Chain-of-thought (CoT) reasoning boosts large language models' (LLMs) performance on complex tasks but faces two key limitations: a lack of reliability when solely relying on LLM-generated reasoning chains and interference from natural language reasoning steps with the models' inference process, also known as the inference logic of LLMs. To address these issues, we propose CoT-RAG, a novel reasoning framework with three key designs: (i) Knowledge Graph-driven CoT Generation,featuring knowledge graphs to modulate reasoning chain generation of LLMs, thereby enhancing reasoning credibility; (ii) Learnable Knowledge Case-aware RAG, which incorporates retrieval-augmented generation (RAG) into knowledge graphs to retrieve relevant sub-cases and sub-descriptions, providing LLMs with learnable information; (iii) Pseudo-Program Prompting Execution, which promotes greater logical rigor by guiding LLMs to execute reasoning tasks as pseudo-programs. Evaluations on nine public datasets spanning three reasoning tasks reveal significant accuracy gains--ranging from 4.0% to 44.3%--over state-of-the-art methods. Furthermore, tests on four domain-specific datasets demonstrate exceptional accuracy and efficient execution, underscoring its practical applicability and scalability. △ Less

Submitted 18 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

arXiv:2504.13242 [pdf, other]

Dynamic Memory-enhanced Transformer for Hyperspectral Image Classification

Authors: Muhammad Ahmad, Manuel Mazzara, Salvatore Distefano, Adil Mehmood Khan

Abstract: Hyperspectral image (HSI) classification remains a challenging task due to the intricate spatial-spectral correlations. Existing transformer models excel in capturing long-range dependencies but often suffer from information redundancy and attention inefficiencies, limiting their ability to model fine-grained relationships crucial for HSI classification. To overcome these limitations, this work pr… ▽ More Hyperspectral image (HSI) classification remains a challenging task due to the intricate spatial-spectral correlations. Existing transformer models excel in capturing long-range dependencies but often suffer from information redundancy and attention inefficiencies, limiting their ability to model fine-grained relationships crucial for HSI classification. To overcome these limitations, this work proposes MemFormer, a lightweight and memory-enhanced transformer. MemFormer introduces a memory-enhanced multi-head attention mechanism that iteratively refines a dynamic memory module, enhancing feature extraction while reducing redundancy across layers. Additionally, a dynamic memory enrichment strategy progressively captures complex spatial and spectral dependencies, leading to more expressive feature representations. To further improve structural consistency, we incorporate a spatial-spectral positional encoding (SSPE) tailored for HSI data, ensuring continuity without the computational burden of convolution-based approaches. Extensive experiments on benchmark datasets demonstrate that MemFormer achieves superior classification accuracy, outperforming state-of-the-art methods. △ Less

Submitted 17 April, 2025; originally announced April 2025.

arXiv:2504.13041 [pdf, other]

QI-MPC: A Hybrid Quantum-Inspired Model Predictive Control for Learning Optimal Policies

Authors: Muhammad Al-Zafar Khan, Jamal Al-Karaki

Abstract: In this paper, we present Quantum-Inspired Model Predictive Control (QIMPC), an approach that uses Variational Quantum Circuits (VQCs) to learn control polices in MPC problems. The viability of the approach is tested in five experiments: A target-tracking control strategy, energy-efficient building climate control, autonomous vehicular dynamics, the simple pendulum, and the compound pendulum. Thre… ▽ More In this paper, we present Quantum-Inspired Model Predictive Control (QIMPC), an approach that uses Variational Quantum Circuits (VQCs) to learn control polices in MPC problems. The viability of the approach is tested in five experiments: A target-tracking control strategy, energy-efficient building climate control, autonomous vehicular dynamics, the simple pendulum, and the compound pendulum. Three safety guarantees were established for the approach, and the experiments gave the motivation for two important theoretical results that, in essence, identify systems for which the approach works best. △ Less

Submitted 17 April, 2025; originally announced April 2025.

Comments: 41 pages, 21 figures

arXiv:2504.12399 [pdf, other]

A tensor network approach to sensing quantum light-matter interactions

Authors: Aiman Khan, Francesco Albarelli, Animesh Datta

Abstract: We present the fundamental limits to the precision of estimating parameters of a quantum matter system probed by light, even when some of the light is lost. This practically inevitable scenario leads to a tripartite quantum system of matter, and light -- detected and lost. Evaluating fundamental information theoretic quantities such as the quantum Fisher information of only the detected light was… ▽ More We present the fundamental limits to the precision of estimating parameters of a quantum matter system probed by light, even when some of the light is lost. This practically inevitable scenario leads to a tripartite quantum system of matter, and light -- detected and lost. Evaluating fundamental information theoretic quantities such as the quantum Fisher information of only the detected light was heretofore impossible. We succeed by expressing the final quantum state of the detected light as a matrix product operator. We apply our method to resonance fluorescence and pulsed spectroscopy. For both, we quantify the sub-optimality of continuous homodyning and photo-counting measurements in parameter estimation. For the latter, we find that single-photon Fock state pulses allow higher precision per photon than pulses of coherent states. Our method should be valuable in studies of quantum light-matter interactions, quantum light spectroscopy, quantum stochastic thermodynamics, and quantum clocks. △ Less

Submitted 16 April, 2025; originally announced April 2025.

Comments: 21 pages, 5 figures. See related work by D. Yang et al

arXiv:2504.12088 [pdf, ps, other]

AttentionDrop: A Novel Regularization Method for Transformer Models

Authors: Mirza Samad Ahmed Baig, Syeda Anshrah Gillani, Abdul Akbar Khan, Shahid Munir Shah

Abstract: Transformer-based architectures achieve state-of-the-art performance across a wide range of tasks in natural language processing, computer vision, and speech. However, their immense capacity often leads to overfitting, especially when training data is limited or noisy. We propose AttentionDrop, a unified family of stochastic regularization techniques that operate directly on the self-attention dis… ▽ More Transformer-based architectures achieve state-of-the-art performance across a wide range of tasks in natural language processing, computer vision, and speech. However, their immense capacity often leads to overfitting, especially when training data is limited or noisy. We propose AttentionDrop, a unified family of stochastic regularization techniques that operate directly on the self-attention distributions. We introduces three variants: 1. Hard Attention Masking: randomly zeroes out top-k attention logits per query to encourage diverse context utilization. 2. Blurred Attention Smoothing: applies a dynamic Gaussian convolution over attention logits to diffuse overly peaked distributions. 3. Consistency-Regularized AttentionDrop: enforces output stability under multiple independent AttentionDrop perturbations via a KL-based consistency loss. △ Less

Submitted 16 April, 2025; originally announced April 2025.

Comments: 26 pages

arXiv:2504.10964 [pdf, other]

Distributed Optimization with Gradient Tracking over Heterogeneous Delay-Prone Directed Networks

Authors: Evagoras Makridis, Gabriele Oliva, Kasagatta Ramesh Narahari, Mohammadreza Doostmohammadian, Usman A. Khan, Themistoklis Charalambous

Abstract: In this paper, we address the distributed optimization problem over unidirectional networks with possibly time-invariant heterogeneous bounded transmission delays. In particular, we propose a modified version of the Accelerated Distributed Directed OPTimization (ADD-OPT) algorithm, herein called Robustified ADD-OPT (R-ADD-OPT), which is able to solve the distributed optimization problem, even when… ▽ More In this paper, we address the distributed optimization problem over unidirectional networks with possibly time-invariant heterogeneous bounded transmission delays. In particular, we propose a modified version of the Accelerated Distributed Directed OPTimization (ADD-OPT) algorithm, herein called Robustified ADD-OPT (R-ADD-OPT), which is able to solve the distributed optimization problem, even when the communication links suffer from heterogeneous but bounded transmission delays. We show that if the gradient step-size of the R-ADD-OPT algorithm is within a certain range, which also depends on the maximum time delay in the network, then the nodes are guaranteed to converge to the optimal solution of the distributed optimization problem. The range of the gradient step-size that guarantees convergence can be computed a priori based on the maximum time delay in the network. △ Less

Submitted 16 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

arXiv:2504.10677 [pdf, other]

Achieving Optimal Tissue Repair Through MARL with Reward Shaping and Curriculum Learning

Authors: Muhammad Al-Zafar Khan, Jamal Al-Karaki

Abstract: In this paper, we present a multi-agent reinforcement learning (MARL) framework for optimizing tissue repair processes using engineered biological agents. Our approach integrates: (1) stochastic reaction-diffusion systems modeling molecular signaling, (2) neural-like electrochemical communication with Hebbian plasticity, and (3) a biologically informed reward function combining chemical gradient t… ▽ More In this paper, we present a multi-agent reinforcement learning (MARL) framework for optimizing tissue repair processes using engineered biological agents. Our approach integrates: (1) stochastic reaction-diffusion systems modeling molecular signaling, (2) neural-like electrochemical communication with Hebbian plasticity, and (3) a biologically informed reward function combining chemical gradient tracking, neural synchronization, and robust penalties. A curriculum learning scheme guides the agent through progressively complex repair scenarios. In silico experiments demonstrate emergent repair strategies, including dynamic secretion control and spatial coordination. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: 14 pages, 4 figures, submitted to the 10th International Conference on Information and Communication Technology for Intelligent Systems (ICTIS)

arXiv:2504.10374 [pdf, other]

Ctrl-Z: Controlling AI Agents via Resampling

Authors: Aryan Bhatt, Cody Rushing, Adam Kaufman, Tyler Tracy, Vasil Georgiev, David Matolcsi, Akbir Khan, Buck Shlegeris

Abstract: Control evaluations measure whether monitoring and security protocols for AI systems prevent intentionally subversive AI models from causing harm. Our work presents the first control evaluation performed in an agent environment. We construct BashBench, a dataset of 257 challenging multi-step system administration tasks, and evaluate whether various safety measures can prevent an adversarially cons… ▽ More Control evaluations measure whether monitoring and security protocols for AI systems prevent intentionally subversive AI models from causing harm. Our work presents the first control evaluation performed in an agent environment. We construct BashBench, a dataset of 257 challenging multi-step system administration tasks, and evaluate whether various safety measures can prevent an adversarially constructed AI agent from covertly downloading and executing malicious code in this environment. This multi-step setting introduces new attack and defense dynamics, which we investigate in order to design novel control protocols that prevent safety failures without hindering the ability of non-malicious agents to perform useful work. We introduce a class of control protocols called resample protocols that dynamically take additional samples of certain actions. We find these protocols significantly improve on existing techniques by selectively blocking the AI agent from executing suspicious code and incriminating the agent by generating additional examples of dangerous behavior. We measure the tradeoff between attack prevention and usefulness; our best protocol combines resampling with analysis of previous steps, reducing the success rate of attacks from 58% to 7% at a 5% cost to the performance of a non-malicious agent. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: bashcontrol.com

arXiv:2504.09713 [pdf, other]

A Full Spectrum of 3D Ferroelectric Memory Architectures Shaped by Polarization Sensing

Authors: Jiahui Duan, Asif Khan, Xiao Gong, Vijaykrishnan Narayanan, Kai Ni

Abstract: Ferroelectric memories have attracted significant interest due to their non-volatile storage, energy efficiency, and fast operation, making them prime candidates for future memory technologies. As commercial Dynamic Random Access Memory (DRAM) and NAND flash memory are transiting or have moved toward three-dimensional (3D) integration, 3D ferroelectric memory architectures are also emerging, provi… ▽ More Ferroelectric memories have attracted significant interest due to their non-volatile storage, energy efficiency, and fast operation, making them prime candidates for future memory technologies. As commercial Dynamic Random Access Memory (DRAM) and NAND flash memory are transiting or have moved toward three-dimensional (3D) integration, 3D ferroelectric memory architectures are also emerging, provided they can achieve a competitive position within the modern memory hierarchy. Given the excellent scalability of ferroelectric HfO2, various dense 3D integrated ferroelectric memory architectures are feasible, each offering unique strengths and facing distinct challenges. In this work, we present a comprehensive classification of 3D ferroelectric memory architectures based on polarization sensing methods, highlighting their critical role in shaping memory cell design and operational efficiency. Through a systematic evaluation of these architectures, we develop a unified framework to assess their advantages and trade-offs. This classification not only enhances the understanding of current 3D ferroelectric memory technologies but also lays the foundation for designing next-generation architectures optimized for advanced computing and high-performance applications. △ Less

Submitted 13 April, 2025; originally announced April 2025.

Comments: 65 pages, 5 figures

arXiv:2504.08340 [pdf, other]

All-in-Memory Stochastic Computing using ReRAM

Authors: João Paulo C. de Lima, Mehran Shoushtari Moghadam, Sercan Aygun, Jeronimo Castrillon, M. Hassan Najafi, Asif Ali Khan

Abstract: As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks. Stochastic computing (SC) offers a promising alternative by approximating complex arithmetic operations, such as addition and multiplication, using simple bitwise operations, like majority or AND, on random bit-streams. While SC… ▽ More As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks. Stochastic computing (SC) offers a promising alternative by approximating complex arithmetic operations, such as addition and multiplication, using simple bitwise operations, like majority or AND, on random bit-streams. While SC operations are inherently fault-tolerant, their accuracy largely depends on the length and quality of the stochastic bit-streams (SBS). These bit-streams are typically generated by CMOS-based stochastic bit-stream generators that consume over 80% of the SC system's power and area. Current SC solutions focus on optimizing the logic gates but often neglect the high cost of moving the bit-streams between memory and processor. This work leverages the physics of emerging ReRAM devices to implement the entire SC flow in place: (1) generating low-cost true random numbers and SBSs, (2) conducting SC operations, and (3) converting SBSs back to binary. Considering the low reliability of ReRAM cells, we demonstrate how SC's robustness to errors copes with ReRAM's variability. Our evaluation shows significant improvements in throughput (1.39x, 2.16x) and energy consumption (1.15x, 2.8x) over state-of-the-art (CMOS- and ReRAM-based) solutions, respectively, with an average image quality drop of 5% across multiple SBS lengths and image processing tasks. △ Less

Submitted 11 April, 2025; originally announced April 2025.

Comments: 7 pages, 5 figures, To appear in DAC 2025

arXiv:2504.08208 [pdf, other]

How Good Are Large Language Models for Course Recommendation in MOOCs?

Authors: Boxuan Ma, Md Akib Zabed Khan, Tianyuan Yang, Agoritsa Polyzou, Shin'ichi Konomi

Abstract: Large Language Models (LLMs) have made significant strides in natural language processing and are increasingly being integrated into recommendation systems. However, their potential in educational recommendation systems has yet to be fully explored. This paper investigates the use of LLMs as a general-purpose recommendation model, leveraging their vast knowledge derived from large-scale corpora fo… ▽ More Large Language Models (LLMs) have made significant strides in natural language processing and are increasingly being integrated into recommendation systems. However, their potential in educational recommendation systems has yet to be fully explored. This paper investigates the use of LLMs as a general-purpose recommendation model, leveraging their vast knowledge derived from large-scale corpora for course recommendation tasks. We explore a variety of approaches, ranging from prompt-based methods to more advanced fine-tuning techniques, and compare their performance against traditional recommendation models. Extensive experiments were conducted on a real-world MOOC dataset, evaluating using LLMs as course recommendation systems across key dimensions such as accuracy, diversity, and novelty. Our results demonstrate that LLMs can achieve good performance comparable to traditional models, highlighting their potential to enhance educational recommendation systems. These findings pave the way for further exploration and development of LLM-based approaches in the context of educational recommendations. △ Less

Submitted 10 April, 2025; originally announced April 2025.

arXiv:2504.05809 [pdf, other]

Loss-free enhancement of photonic spin Hall shift by electromagnetically induced transparency

Authors: Kezhou Du, Aizaz Khan, Lei Gao, Muzamil Shah, Xinxing Zhou, Dongliang Gao

Abstract: The photonic spin Hall effect (PSHE), a result of spin-orbit interaction, has attracted significant interest because of its fundamental importance and potential applications. Optical losses are ubiquitous, which inherently suppress the photonic spin Hall shift (PSHS). In this work, we consider an atomic medium that exhibits both absorption and transparency to investigate and mitigate the effects o… ▽ More The photonic spin Hall effect (PSHE), a result of spin-orbit interaction, has attracted significant interest because of its fundamental importance and potential applications. Optical losses are ubiquitous, which inherently suppress the photonic spin Hall shift (PSHS). In this work, we consider an atomic medium that exhibits both absorption and transparency to investigate and mitigate the effects of loss on PSHS. We demonstrate that laser-induced coherence in an atomic medium, leading to electromagnetically induced transparency (EIT) at resonance, counteracts the detrimental effects of losses on the PSHS. Upon EIT in a coherent medium enclosed within dielectric slabs, the reflectivity of the incident polarized state is reduced near Brewster's angle to enhance PSHS. Moreover, the tunable refractive index of the atomic medium enables the manipulation of PSHS without structural modifications with a tiny loss. Our proposed loss-free approach to PSHS may enable advanced optical sensing and other spin-based applications. △ Less

Submitted 8 April, 2025; originally announced April 2025.

arXiv:2504.04722 [pdf, other]

TactileNet: Bridging the Accessibility Gap with AI-Generated Tactile Graphics for Individuals with Vision Impairment

Authors: Adnan Khan, Alireza Choubineh, Mai A. Shaaban, Abbas Akkasi, Majid Komeili

Abstract: Tactile graphics are essential for providing access to visual information for the 43 million people globally living with vision loss. Traditional methods for creating these graphics are labor-intensive and cannot meet growing demand. We introduce TactileNet, the first comprehensive dataset and AI-driven framework for generating embossing-ready 2D tactile templates using text-to-image Stable Diffus… ▽ More Tactile graphics are essential for providing access to visual information for the 43 million people globally living with vision loss. Traditional methods for creating these graphics are labor-intensive and cannot meet growing demand. We introduce TactileNet, the first comprehensive dataset and AI-driven framework for generating embossing-ready 2D tactile templates using text-to-image Stable Diffusion (SD) models. By integrating Low-Rank Adaptation (LoRA) and DreamBooth, our method fine-tunes SD models to produce high-fidelity, guideline-compliant graphics while reducing computational costs. Quantitative evaluations with tactile experts show 92.86% adherence to accessibility standards. Structural fidelity analysis revealed near-human design similarity, with an SSIM of 0.538 between generated graphics and expert-designed tactile images. Notably, our method preserves object silhouettes better than human designs (SSIM = 0.259 vs. 0.215 for binary masks), addressing a key limitation of manual tactile abstraction. The framework scales to 32,000 images (7,050 high-quality) across 66 classes, with prompt editing enabling customizable outputs (e.g., adding or removing details). By automating the 2D template generation step-compatible with standard embossing workflows-TactileNet accelerates production while preserving design flexibility. This work demonstrates how AI can augment (not replace) human expertise to bridge the accessibility gap in education and beyond. Code, data, and models will be publicly released to foster further research. △ Less

Submitted 15 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

arXiv:2504.04372 [pdf, other]

How Accurately Do Large Language Models Understand Code?

Authors: Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, Waris Gill, Abdul Haddi Amjad, Ali R. Butt, Mohammad Taha Khan, Muhammad Ali Gulzar

Abstract: Large Language Models (LLMs) are increasingly used in post-development tasks such as code repair and testing. A key factor in these tasks' success is the model's deep understanding of code. However, the extent to which LLMs truly understand code remains largely unevaluated. Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric. Previously, t… ▽ More Large Language Models (LLMs) are increasingly used in post-development tasks such as code repair and testing. A key factor in these tasks' success is the model's deep understanding of code. However, the extent to which LLMs truly understand code remains largely unevaluated. Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric. Previously, this was assessed through developer surveys, which are not feasible for evaluating LLMs. Existing LLM benchmarks focus primarily on code generation, fundamentally different from code comprehension. Additionally, fixed benchmarks quickly become obsolete as they become part of the training data. This paper presents the first large-scale empirical investigation into LLMs' ability to understand code. Inspired by mutation testing, we use an LLM's fault-finding ability as a proxy for its deep code understanding. This approach is based on the insight that a model capable of identifying subtle functional discrepancies must understand the code well. We inject faults in real-world programs and ask the LLM to localize them, ensuring the specifications suffice for fault localization. Next, we apply semantic-preserving code mutations (SPMs) to the faulty programs and test whether the LLMs still locate the faults, verifying their confidence in code understanding. We evaluate nine popular LLMs on 600,010 debugging tasks from 670 Java and 637 Python programs. We find that LLMs lose the ability to debug the same bug in 78% of faulty programs when SPMs are applied, indicating a shallow understanding of code and reliance on features irrelevant to semantics. We also find that LLMs understand code earlier in the program better than later. This suggests that LLMs' code comprehension remains tied to lexical and syntactic features due to tokenization designed for natural languages, which overlooks code semantics. △ Less

Submitted 9 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

Comments: This paper is currently Under Review. It consists of 11 pages, 12 Figures, and 5 Tables

arXiv:2504.04235 [pdf, other]

Quantum parallel information exchange (QPIE) hybrid network with transfer learning

Authors: Ziqing Guo, Alex Khan, Victor S. Sheng, Shabnam Jabeen, Ziwen Pan

Abstract: Quantum machine learning (QML) has emerged as an innovative framework with the potential to uncover complex patterns by leveraging quantum systems ability to simulate and exploit high-dimensional latent spaces, particularly in learning tasks. Quantum neural network (QNN) frameworks are inherently sensitive to the precision of gradient calculations and the computational limitations of current quant… ▽ More Quantum machine learning (QML) has emerged as an innovative framework with the potential to uncover complex patterns by leveraging quantum systems ability to simulate and exploit high-dimensional latent spaces, particularly in learning tasks. Quantum neural network (QNN) frameworks are inherently sensitive to the precision of gradient calculations and the computational limitations of current quantum hardware as unitary rotations introduce overhead from complex number computations, and the quantum gate operation speed remains a bottleneck for practical implementations. In this study, we introduce quantum parallel information exchange (QPIE) hybrid network, a new non-sequential hybrid classical quantum model architecture, leveraging quantum transfer learning by feeding pre-trained parameters from classical neural networks into quantum circuits, which enables efficient pattern recognition and temporal series data prediction by utilizing non-clifford parameterized quantum gates thereby enhancing both learning efficiency and representational capacity. Additionally, we develop a dynamic gradient selection method that applies the parameter shift rule on quantum processing units (QPUs) and adjoint differentiation on GPUs. Our results demonstrate model performance exhibiting higher accuracy in ad-hoc benchmarks, lowering approximately 88% convergence rate for extra stochasticity time-series data within 100-steps, and showcasing a more unbaised eigenvalue spectrum of the fisher information matrix on CPU/GPU and IonQ QPU simulators. △ Less

Submitted 5 April, 2025; originally announced April 2025.

arXiv:2504.04124 [pdf, other]

EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection

Authors: Muhammad Ahmed Ullah Khan, Abdul Hannan Khan, Andreas Dengel

Abstract: Event cameras have higher temporal resolution, and require less storage and bandwidth compared to traditional RGB cameras. However, due to relatively lagging performance of event-based approaches, event cameras have not yet replace traditional cameras in performance-critical applications like autonomous driving. Recent approaches in event-based object detection try to bridge this gap by employing… ▽ More Event cameras have higher temporal resolution, and require less storage and bandwidth compared to traditional RGB cameras. However, due to relatively lagging performance of event-based approaches, event cameras have not yet replace traditional cameras in performance-critical applications like autonomous driving. Recent approaches in event-based object detection try to bridge this gap by employing computationally expensive transformer-based solutions. However, due to their resource-intensive components, these solutions fail to exploit the sparsity and higher temporal resolution of event cameras efficiently. Moreover, these solutions are adopted from the vision domain, lacking specificity to the event cameras. In this work, we explore efficient and performant alternatives to recurrent vision transformer models and propose a novel event-based object detection backbone. The proposed backbone employs a novel Event Progression Extractor module, tailored specifically for event data, and uses Metaformer concept with convolution-based efficient components. We evaluate the resultant model on well-established traffic object detection benchmarks and conduct cross-dataset evaluation to test its ability to generalize. The proposed model outperforms the state-of-the-art on Prophesee Gen1 dataset by 1.6 mAP while reducing inference time by 14%. Our proposed EMF becomes the fastest DNN-based architecture in the domain by outperforming most efficient event-based object detectors. Moreover, the proposed model shows better ability to generalize to unseen data and scales better with the abundance of data. △ Less

Submitted 5 April, 2025; originally announced April 2025.

Comments: 10 pages, 2 figures

Showing 51–100 of 1,913 results for author: Khan, A