Skip to main content

Showing 1–50 of 1,040 results for author: Khan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10055  [pdf, ps, other

    cs.CV cs.AI

    PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language

    Authors: Ijazul Haq, Yingjie Zhang, Irfan Ali Khan

    Abstract: This paper evaluates the performance of Large Multimodal Models (LMMs) on Optical Character Recognition (OCR) in the low-resource Pashto language. Natural Language Processing (NLP) in Pashto faces several challenges due to the cursive nature of its script and a scarcity of structured datasets. To address this, we developed a synthetic Pashto OCR dataset, PsOCR, consisting of one million images ann… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09894  [pdf, ps, other

    cs.SE

    Advancing Mobile UI Testing by Learning Screen Usage Semantics

    Authors: Safwat Ali Khan

    Abstract: The demand for quality in mobile applications has increased greatly given users' high reliance on them for daily tasks. Developers work tirelessly to ensure that their applications are both functional and user-friendly. In pursuit of this, Automated Input Generation (AIG) tools have emerged as a promising solution for testing mobile applications by simulating user interactions and exploring app fu… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.07635  [pdf, ps, other

    cs.LG cs.DB

    Generating Skyline Explanations for Graph Neural Networks

    Authors: Dazhuo Qiu, Haolai Che, Arijit Khan, Yinghui Wu

    Abstract: This paper proposes a novel approach to generate subgraph explanations for graph neural networks GNNs that simultaneously optimize multiple measures for explainability. Existing GNN explanation methods often compute subgraphs (called ``explanatory subgraphs'') that optimize a pre-defined, single explainability measure, such as fidelity or conciseness. This can lead to biased explanations that cann… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  4. arXiv:2505.07634  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

    Authors: Jian Liu, Xiongtao Shi, Thai Duy Nguyen, Haitian Zhang, Tianxiang Zhang, Wei Sun, Yanjie Li, Athanasios V. Vasilakos, Giovanni Iacca, Arshad Ali Khan, Arvind Kumar, Jae Won Cho, Ajmal Mian, Lihua Xie, Erik Cambria, Lin Wang

    Abstract: The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the ris… ▽ More

    Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: 51 pages, 17 figures, 9 tables

  5. arXiv:2505.06229  [pdf, ps, other

    cs.LG math.NA

    Neural Network Operator-Based Fractal Approximation: Smoothness Preservation and Convergence Analysis

    Authors: Aaqib Ayoub Bhat, Asif Khan, M. Mursaleen

    Abstract: This paper presents a new approach of constructing $α$-fractal interpolation functions (FIFs) using neural network operators, integrating concepts from approximation theory. Initially, we construct $α$-fractals utilizing neural network-based operators, providing an approach to generating fractal functions with interpolation properties. Based on the same foundation, we have developed fractal interp… ▽ More

    Submitted 22 March, 2025; originally announced May 2025.

    Comments: 18 pages

    MSC Class: 28A80; 41A05; 41A25; 41A29; 41A30; 65D05

  6. arXiv:2505.04318  [pdf, other

    cs.LG cs.AI eess.IV

    Detecting Concept Drift in Neural Networks Using Chi-squared Goodness of Fit Testing

    Authors: Jacob Glenn Ayers, Buvaneswari A. Ramanan, Manzoor A. Khan

    Abstract: As the adoption of deep learning models has grown beyond human capacity for verification, meta-algorithms are needed to ensure reliable model inference. Concept drift detection is a field dedicated to identifying statistical shifts that is underutilized in monitoring neural networks that may encounter inference data with distributional characteristics diverging from their training data. Given the… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 8 pages, 6 figures, 1 table

  7. arXiv:2505.03931  [pdf, other

    cs.RO

    NMPC-Lander: Nonlinear MPC with Barrier Function for UAV Landing on a Mobile Platform

    Authors: Amber Batool, Faryal Batool, Roohan Ahmed Khan, Muhammad Ahsan Mustafa, Aleksey Fedoseev, Dzmitry Tsetserukou

    Abstract: Quadcopters are versatile aerial robots gaining popularity in numerous critical applications. However, their operational effectiveness is constrained by limited battery life and restricted flight range. To address these challenges, autonomous drone landing on stationary or mobile charging and battery-swapping stations has become an essential capability. In this study, we present NMPC-Lander, a nov… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: This manuscript has been submitted to the IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2025

  8. arXiv:2505.03787  [pdf, other

    cs.LG cs.AI eess.SP

    ArrhythmiaVision: Resource-Conscious Deep Learning Models with Visual Explanations for ECG Arrhythmia Classification

    Authors: Zuraiz Baig, Sidra Nasir, Rizwan Ahmed Khan, Muhammad Zeeshan Ul Haque

    Abstract: Cardiac arrhythmias are a leading cause of life-threatening cardiac events, highlighting the urgent need for accurate and timely detection. Electrocardiography (ECG) remains the clinical gold standard for arrhythmia diagnosis; however, manual interpretation is time-consuming, dependent on clinical expertise, and prone to human error. Although deep learning has advanced automated ECG analysis, many… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

    Comments: 14 pages and 08 figures

  9. arXiv:2505.03406  [pdf, other

    cs.CL cs.AI

    Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation

    Authors: Mohammad Shoaib Ansari, Mohd Sohail Ali Khan, Shubham Revankar, Aditya Varma, Anil S. Mokhade

    Abstract: This research paper investigates the application of Large Language Models (LLMs) in healthcare, specifically focusing on enhancing medical decision support through Retrieval-Augmented Generation (RAG) integrated with hospital-specific data and fine-tuning using Quantized Low-Rank Adaptation (QLoRA). The system utilizes Llama 3.2-3B-Instruct as its foundation model. By embedding and retrieving cont… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 12 pages

  10. arXiv:2505.01435  [pdf, other

    cs.IR cs.CL cs.DC cs.LG

    AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine

    Authors: Carlo Siebenschuh, Kyle Hippe, Ozan Gokdemir, Alexander Brace, Arham Khan, Khalid Hossain, Yadu Babuji, Nicholas Chia, Venkatram Vishwanath, Rick Stevens, Arvind Ramanathan, Ian Foster, Robert Underwood

    Abstract: Language models for scientific tasks are trained on text from scientific publications, most distributed as PDFs that require parsing. PDF parsing approaches range from inexpensive heuristics (for simple documents) to computationally intensive ML-driven systems (for complex or degraded ones). The choice of the "best" parser for a particular document depends on its computational cost and the accurac… ▽ More

    Submitted 23 April, 2025; originally announced May 2025.

    Comments: This paper has been accepted at the The Eighth Annual Conference on Machine Learning and Systems (MLSys 2025)

  11. arXiv:2504.21831  [pdf, other

    cs.CV cs.AI

    Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization

    Authors: Anas Anwarul Haq Khan, Utkarsh Verma, Prateek Chanda, Ganesh Ramakrishnan

    Abstract: We introduce DEEVISum (Distilled Early Exit Vision language model for Summarization), a lightweight, efficient, and scalable vision language model designed for segment wise video summarization. Leveraging multi modal prompts that combine textual and audio derived signals, DEEVISum incorporates Multi Stage Knowledge Distillation (MSKD) and Early Exit (EE) to strike a balance between performance and… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  12. arXiv:2504.19461  [pdf

    cs.SE

    The Role of Generative AI in Strengthening Secure Software Coding Practices: A Systematic Perspective

    Authors: Hathal S. Alwageed, Rafiq Ahmad Khan

    Abstract: As software security threats continue to evolve, the demand for innovative ways of securing coding has tremendously grown. The integration of Generative AI (GenAI) into software development holds significant potential for improving secure coding practices. This paper aims at systematically studying the impact of GenAI in enhancing secure coding practices from improving software security, setting f… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 1-6 pages

  13. arXiv:2504.19271  [pdf, other

    cs.CV

    Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection

    Authors: Athul M. Mathew, Arshad Ali Khan, Thariq Khalid, Faroq AL-Tam, Riad Souissi

    Abstract: Gaze target detection (GTD) is the task of predicting where a person in an image is looking. This is a challenging task, as it requires the ability to understand the relationship between the person's head, body, and eyes, as well as the surrounding environment. In this paper, we propose a novel method for GTD that fuses multiple pieces of information extracted from an image. First, we project the… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: accepted at NeurIPS 2023 Gaze Meets ML Workshop

  14. arXiv:2504.18856  [pdf, other

    cs.CV

    Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation

    Authors: Shahad Albastaki, Anabia Sohail, Iyyakutti Iyappan Ganapathi, Basit Alawode, Asim Khan, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, Arif Mahmood

    Abstract: In Computational Pathology (CPath), the introduction of Vision-Language Models (VLMs) has opened new avenues for research, focusing primarily on aligning image-text pairs at a single magnification level. However, this approach might not be sufficient for tasks like cancer subtype classification, tissue phenotyping, and survival analysis due to the limited level of detail that a single-resolution i… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  15. arXiv:2504.15995  [pdf, other

    cs.LG cs.AI

    OPUS-VFL: Incentivizing Optimal Privacy-Utility Tradeoffs in Vertical Federated Learning

    Authors: Sindhuja Madabushi, Ahmad Faraz Khan, Haider Ali, Jin-Hee Cho

    Abstract: Vertical Federated Learning (VFL) enables organizations with disjoint feature spaces but shared user bases to collaboratively train models without sharing raw data. However, existing VFL systems face critical limitations: they often lack effective incentive mechanisms, struggle to balance privacy-utility tradeoffs, and fail to accommodate clients with heterogeneous resource capabilities. These cha… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  16. arXiv:2504.13534  [pdf, other

    cs.CL cs.AI

    CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models

    Authors: Feiyang Li, Peng Fang, Zhan Shi, Arijit Khan, Fang Wang, Dan Feng, Weihao Wang, Xin Zhang, Yongjian Cui

    Abstract: While chain-of-thought (CoT) reasoning improves the performance of large language models (LLMs) in complex tasks, it still has two main challenges: the low reliability of relying solely on LLMs to generate reasoning chains and the interference of natural language reasoning chains on the inference logic of LLMs. To address these issues, we propose CoT-RAG, a novel reasoning framework with three key… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  17. arXiv:2504.13242  [pdf, other

    cs.CV

    Dynamic Memory-enhanced Transformer for Hyperspectral Image Classification

    Authors: Muhammad Ahmad, Manuel Mazzara, Salvatore Distefano, Adil Mehmood Khan

    Abstract: Hyperspectral image (HSI) classification remains a challenging task due to the intricate spatial-spectral correlations. Existing transformer models excel in capturing long-range dependencies but often suffer from information redundancy and attention inefficiencies, limiting their ability to model fine-grained relationships crucial for HSI classification. To overcome these limitations, this work pr… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  18. arXiv:2504.12088  [pdf, ps, other

    cs.CV cs.AI cs.LG

    AttentionDrop: A Novel Regularization Method for Transformer Models

    Authors: Mirza Samad Ahmed Baig, Syeda Anshrah Gillani, Abdul Akbar Khan, Shahid Munir Shah

    Abstract: Transformer-based architectures achieve state-of-the-art performance across a wide range of tasks in natural language processing, computer vision, and speech. However, their immense capacity often leads to overfitting, especially when training data is limited or noisy. We propose AttentionDrop, a unified family of stochastic regularization techniques that operate directly on the self-attention dis… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: 26 pages

  19. arXiv:2504.10677  [pdf, other

    cs.LG cs.AI cs.MA

    Achieving Optimal Tissue Repair Through MARL with Reward Shaping and Curriculum Learning

    Authors: Muhammad Al-Zafar Khan, Jamal Al-Karaki

    Abstract: In this paper, we present a multi-agent reinforcement learning (MARL) framework for optimizing tissue repair processes using engineered biological agents. Our approach integrates: (1) stochastic reaction-diffusion systems modeling molecular signaling, (2) neural-like electrochemical communication with Hebbian plasticity, and (3) a biologically informed reward function combining chemical gradient t… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 14 pages, 4 figures, submitted to the 10th International Conference on Information and Communication Technology for Intelligent Systems (ICTIS)

  20. arXiv:2504.10374  [pdf, other

    cs.LG

    Ctrl-Z: Controlling AI Agents via Resampling

    Authors: Aryan Bhatt, Cody Rushing, Adam Kaufman, Tyler Tracy, Vasil Georgiev, David Matolcsi, Akbir Khan, Buck Shlegeris

    Abstract: Control evaluations measure whether monitoring and security protocols for AI systems prevent intentionally subversive AI models from causing harm. Our work presents the first control evaluation performed in an agent environment. We construct BashBench, a dataset of 257 challenging multi-step system administration tasks, and evaluate whether various safety measures can prevent an adversarially cons… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: bashcontrol.com

  21. arXiv:2504.09713  [pdf, other

    cs.ET

    A Full Spectrum of 3D Ferroelectric Memory Architectures Shaped by Polarization Sensing

    Authors: Jiahui Duan, Asif Khan, Xiao Gong, Vijaykrishnan Narayanan, Kai Ni

    Abstract: Ferroelectric memories have attracted significant interest due to their non-volatile storage, energy efficiency, and fast operation, making them prime candidates for future memory technologies. As commercial Dynamic Random Access Memory (DRAM) and NAND flash memory are transiting or have moved toward three-dimensional (3D) integration, 3D ferroelectric memory architectures are also emerging, provi… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 65 pages, 5 figures

  22. arXiv:2504.08340  [pdf, other

    cs.ET cs.AR

    All-in-Memory Stochastic Computing using ReRAM

    Authors: João Paulo C. de Lima, Mehran Shoushtari Moghadam, Sercan Aygun, Jeronimo Castrillon, M. Hassan Najafi, Asif Ali Khan

    Abstract: As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks. Stochastic computing (SC) offers a promising alternative by approximating complex arithmetic operations, such as addition and multiplication, using simple bitwise operations, like majority or AND, on random bit-streams. While SC… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 7 pages, 5 figures, To appear in DAC 2025

  23. arXiv:2504.08208  [pdf, other

    cs.IR cs.AI

    How Good Are Large Language Models for Course Recommendation in MOOCs?

    Authors: Boxuan Ma, Md Akib Zabed Khan, Tianyuan Yang, Agoritsa Polyzou, Shin'ichi Konomi

    Abstract: Large Language Models (LLMs) have made significant strides in natural language processing and are increasingly being integrated into recommendation systems. However, their potential in educational recommendation systems has yet to be fully explored. This paper investigates the use of LLMs as a general-purpose recommendation model, leveraging their vast knowledge derived from large-scale corpora fo… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  24. arXiv:2504.04722  [pdf, other

    cs.CV

    TactileNet: Bridging the Accessibility Gap with AI-Generated Tactile Graphics for Individuals with Vision Impairment

    Authors: Adnan Khan, Alireza Choubineh, Mai A. Shaaban, Abbas Akkasi, Majid Komeili

    Abstract: Tactile graphics are essential for providing access to visual information for the 43 million people globally living with vision loss. Traditional methods for creating these graphics are labor-intensive and cannot meet growing demand. We introduce TactileNet, the first comprehensive dataset and AI-driven framework for generating embossing-ready 2D tactile templates using text-to-image Stable Diffus… ▽ More

    Submitted 15 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  25. arXiv:2504.04372  [pdf, other

    cs.SE cs.AI cs.LG

    How Accurately Do Large Language Models Understand Code?

    Authors: Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun, Waris Gill, Abdul Haddi Amjad, Ali R. Butt, Mohammad Taha Khan, Muhammad Ali Gulzar

    Abstract: Large Language Models (LLMs) are increasingly used in post-development tasks such as code repair and testing. A key factor in these tasks' success is the model's deep understanding of code. However, the extent to which LLMs truly understand code remains largely unevaluated. Quantifying code comprehension is challenging due to its abstract nature and the lack of a standardized metric. Previously, t… ▽ More

    Submitted 9 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

    Comments: This paper is currently Under Review. It consists of 11 pages, 12 Figures, and 5 Tables

  26. arXiv:2504.04124  [pdf, other

    cs.CV

    EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection

    Authors: Muhammad Ahmed Ullah Khan, Abdul Hannan Khan, Andreas Dengel

    Abstract: Event cameras have higher temporal resolution, and require less storage and bandwidth compared to traditional RGB cameras. However, due to relatively lagging performance of event-based approaches, event cameras have not yet replace traditional cameras in performance-critical applications like autonomous driving. Recent approaches in event-based object detection try to bridge this gap by employing… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

    Comments: 10 pages, 2 figures

  27. arXiv:2504.02204  [pdf, other

    cs.HC

    Characterizing Creativity in Visualization Design

    Authors: Naimul Hoque, Zinat Ara, Safwat Ali Khan, Fanny Chevalier, Niklas Elmqvist

    Abstract: Understanding the role of creativity in visualization design becomes increasingly important as the field matures, particularly with the emergence of various visualization authoring and recommendation systems. In this paper, we examine how creativity manifests in visualization design processes and how academic research has conceptualized it over time. Through a systematic review of 58 visualization… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  28. Green MLOps to Green GenOps: An Empirical Study of Energy Consumption in Discriminative and Generative AI Operations

    Authors: Adrián Sánchez-Mompó, Ioannis Mavromatis, Peizheng Li, Konstantinos Katsaros, Aftab Khan

    Abstract: This study presents an empirical investigation into the energy consumption of Discriminative and Generative AI models within real-world MLOps pipelines. For Discriminative models, we examine various architectures and hyperparameters during training and inference and identify energy-efficient practices. For Generative AI, Large Language Models (LLMs) are assessed, focusing primarily on energy consu… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: Published to MDPI Information - Artificial Intelligence Section

  29. arXiv:2503.19699  [pdf, other

    cs.AI cs.MA

    Optimal Path Planning and Cost Minimization for a Drone Delivery System Via Model Predictive Control

    Authors: Muhammad Al-Zafar Khan, Jamal Al-Karaki

    Abstract: In this study, we formulate the drone delivery problem as a control problem and solve it using Model Predictive Control. Two experiments are performed: The first is on a less challenging grid world environment with lower dimensionality, and the second is with a higher dimensionality and added complexity. The MPC method was benchmarked against three popular Multi-Agent Reinforcement Learning (MARL)… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 15 pages, 5 figures, Submitted to the 2025 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications

  30. arXiv:2503.19365  [pdf, other

    cs.DS

    Improved Approximation Algorithms for Three-Dimensional Knapsack

    Authors: Klaus Jansen, Debajyoti Kar, Arindam Khan, K. V. N. Sreenivas, Malte Tutas

    Abstract: We study the three-dimensional Knapsack (3DK) problem, in which we are given a set of axis-aligned cuboids with associated profits and an axis-aligned cube knapsack. The objective is to find a non-overlapping axis-aligned packing (by translation) of the maximum profit subset of cuboids into the cube. The previous best approximation algorithm is due to Diedrich, Harren, Jansen, Thöle, and Thomas (2… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  31. arXiv:2503.19339  [pdf, other

    cs.CR cs.AI

    Efficient IoT Intrusion Detection with an Improved Attention-Based CNN-BiLSTM Architecture

    Authors: Amna Naeem, Muazzam A. Khan, Nada Alasbali, Jawad Ahmad, Aizaz Ahmad Khattak, Muhammad Shahbaz Khan

    Abstract: The ever-increasing security vulnerabilities in the Internet-of-Things (IoT) systems require improved threat detection approaches. This paper presents a compact and efficient approach to detect botnet attacks by employing an integrated approach that consists of traffic pattern analysis, temporal support learning, and focused feature extraction. The proposed attention-based model benefits from a hy… ▽ More

    Submitted 1 May, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

  32. arXiv:2503.16565  [pdf, other

    cs.LG cs.AI cs.CL q-bio.GN

    Gene42: Long-Range Genomic Foundation Model With Dense Attention

    Authors: Kirill Vishniakov, Boulbaba Ben Amor, Engin Tekin, Nancy A. ElNaker, Karthik Viswanathan, Aleksandr Medvedev, Aahan Singh, Maryam Nadeem, Mohammad Amaan Sayeed, Praveenkumar Kanithi, Tiago Magalhaes, Natalia Vassilieva, Dwarikanath Mahapatra, Marco Pimentel, and Shadab Khan

    Abstract: We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  33. arXiv:2503.15707  [pdf, other

    cs.RO cs.AI

    Safety Aware Task Planning via Large Language Models in Robotics

    Authors: Azal Ahmad Khan, Michael Andrev, Muhammad Ali Murtaza, Sergio Aguilera, Rui Zhang, Jie Ding, Seth Hutchinson, Ali Anwar

    Abstract: The integration of large language models (LLMs) into robotic task planning has unlocked better reasoning capabilities for complex, long-horizon workflows. However, ensuring safety in LLM-driven plans remains a critical challenge, as these models often prioritize task completion over risk mitigation. This paper introduces SAFER (Safety-Aware Framework for Execution in Robotics), a multi-LLM framewo… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  34. arXiv:2503.11101  [pdf

    cs.CV cs.LG

    A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis

    Authors: Asifullah Khan, Laiba Asmatullah, Anza Malik, Shahzaib Khan, Hamna Asif

    Abstract: Self-supervised learning is a machine learning approach that generates implicit labels by learning underlined patterns and extracting discriminative features from unlabeled data without manual labelling. Contrastive learning introduces the concept of "positive" and "negative" samples, where positive pairs (e.g., variation of the same image/object) are brought together in the embedding space, and n… ▽ More

    Submitted 18 April, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

  35. arXiv:2503.10965  [pdf, other

    cs.AI cs.CL cs.LG

    Auditing language models for hidden objectives

    Authors: Samuel Marks, Johannes Treutlein, Trenton Bricken, Jack Lindsey, Jonathan Marcus, Siddharth Mishra-Sharma, Daniel Ziegler, Emmanuel Ameisen, Joshua Batson, Tim Belonax, Samuel R. Bowman, Shan Carter, Brian Chen, Hoagy Cunningham, Carson Denison, Florian Dietz, Satvik Golechha, Akbir Khan, Jan Kirchner, Jan Leike, Austin Meek, Kei Nishimura-Gasparian, Euan Ong, Christopher Olah, Adam Pearce , et al. (10 additional authors not shown)

    Abstract: We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model… ▽ More

    Submitted 27 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

  36. arXiv:2503.09939  [pdf

    cs.CR

    A Chaotic Image Encryption Scheme Using Novel Geometric Block Permutation and Dynamic Substitution

    Authors: Muhammad Ali, Jawad Ahmad, Muhammad Abdullah Hussain Khan, Safee Ullah, Mujeeb Ur Rehman, Syed Aziz Shah, Muhammad Shahbaz Khan

    Abstract: In this digital era, ensuring the security of digital data during transmission and storage is crucial. Digital data, particularly image data, needs to be protected against unauthorized access. To address this, this paper presents a novel image encryption scheme based on a confusion diffusion architecture. The diffusion module introduces a novel geometric block permutation technique, which effectiv… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  37. arXiv:2503.09617  [pdf, other

    cs.MA cs.CL cs.LG

    Factorio Learning Environment

    Authors: Jack Hopkins, Mart Bakler, Akbir Khan

    Abstract: Large Language Models (LLMs) are rapidly saturating existing benchmarks, necessitating new open-ended evaluations. We introduce the Factorio Learning Environment (FLE), based on the game of Factorio, that tests agents in long-term planning, program synthesis, and resource optimization. FLE provides exponentially scaling challenges -- from basic automation to complex factories processing millions o… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  38. arXiv:2503.09041  [pdf

    cs.CR

    A Hybrid Neural Network with Smart Skip Connections for High-Precision, Low-Latency EMG-Based Hand Gesture Recognition

    Authors: Hafsa Wazir, Jawad Ahmad, Muazzam A. Khan, Sana Ullah Jan, Fadia Ali Khan, Muhammad Shahbaz Khan

    Abstract: Electromyography (EMG) is extensively used in key biomedical areas, such as prosthetics, and assistive and interactive technologies. This paper presents a new hybrid neural network named ConSGruNet for precise and efficient hand gesture recognition. The proposed model comprises convolutional neural networks with smart skip connections in conjunction with a Gated Recurrent Unit (GRU). The proposed… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  39. arXiv:2503.08863  [pdf, other

    cs.CG cs.DS

    Improved Approximation Algorithms for Three-Dimensional Bin Packing

    Authors: Debajyoti Kar, Arindam Khan, Malin Rau

    Abstract: We study three fundamental three-dimensional (3D) geometric packing problems: 3D (Geometric) Bin Packing (3D-BP), 3D Strip Packing (3D-SP), and Minimum Volume Bounding Box (3D-MVBB), where given a set of 3D (rectangular) cuboids, the goal is to find an axis-aligned nonoverlapping packing of all cuboids. In 3D-BP, we need to pack the given cuboids into the minimum number of unit cube bins. In 3D-SP… ▽ More

    Submitted 19 April, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  40. arXiv:2503.08688  [pdf, other

    cs.CY

    Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs

    Authors: Ariba Khan, Stephen Casper, Dylan Hadfield-Menell

    Abstract: Research on the 'cultural alignment' of Large Language Models (LLMs) has emerged in response to growing interest in understanding representation across diverse stakeholders. Current approaches to evaluating cultural alignment through survey-based assessments that borrow from social science methodologies often overlook systematic robustness checks. Here, we identify and test three assumptions behin… ▽ More

    Submitted 8 April, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  41. arXiv:2503.07376  [pdf, other

    eess.SY cs.RO

    AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments

    Authors: Grik Tadevosyan, Valerii Serpiva, Aleksey Fedoseev, Roohan Ahmed Khan, Demetros Aschu, Faryal Batool, Nickolay Efanov, Artem Mikhaylov, Dzmitry Tsetserukou

    Abstract: We introduce AttentionSwarm, a novel benchmark designed to evaluate safe and efficient swarm control across three challenging environments: a landing environment with obstacles, a competitive drone game setting, and a dynamic drone racing scenario. Central to our approach is the Attention Model Based Control Barrier Function (CBF) framework, which integrates attention mechanisms with safety-critic… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 6 pages, 6 figures

  42. arXiv:2503.06003  [pdf, other

    cs.CV

    Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models

    Authors: Md Azim Khan, Aryya Gangopadhyay, Jianwu Wang, Robert F. Erbacher

    Abstract: Situational awareness applications rely heavily on real-time processing of visual and textual data to provide actionable insights. Vision language models (VLMs) have become essential tools for interpreting complex environments by connecting visual inputs with natural language descriptions. However, these models often face computational challenges, especially when required to perform efficiently in… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 8 pages, 4 figures

  43. arXiv:2503.05240  [pdf, other

    cs.SE

    Mining Q&A Platforms for Empirical Evidence on Quantum Software Programming

    Authors: Arif Ali Khan, Boshuai Ye, Muhammad Azeem Akbar, Javed Ali Khan, Davoud Mougouei, Xinyuan Ma

    Abstract: The rise of quantum computing has driven the need for quantum software engineering, yet its programming landscape remains largely unexplored in empirical research. As quantum technologies advance toward industrial adoption, understanding programming aspects is crucial to addressing software development challenges. This study analyzes 6,935 quantum software programming discussion posts from Stack E… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  44. arXiv:2503.04928  [pdf, other

    cs.SE

    AUTOFRAME -- A Software-driven Integration Framework for Automotive Systems

    Authors: Sven Kirchner, Nils Purschke, Chengdong Wu, Muhammed Aqib Khan, Divye Dixit, Alois C. Knoll

    Abstract: The evolution of automotive technologies towards more integrated and sophisticated systems requires a shift from traditional distributed architectures to centralized vehicle architectures. This work presents a novel framework that addresses the increasing complexity of Software Defined Vehicles (SDV) through a centralized approach that optimizes software and hardware integration. Our approach intr… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 8 pages; Accepted for publication at the 27th International Conference on Intelligent Transportation Systems (ITSC), Edmonton, Canada, September 24-27, 2024

  45. arXiv:2503.02723  [pdf, other

    cs.RO

    ImpedanceGPT: VLM-driven Impedance Control of Swarm of Mini-drones for Intelligent Navigation in Dynamic Environment

    Authors: Faryal Batool, Malaika Zafar, Yasheerah Yaqoot, Roohan Ahmed Khan, Muhammad Haris Khan, Aleksey Fedoseev, Dzmitry Tsetserukou

    Abstract: Swarm robotics plays a crucial role in enabling autonomous operations in dynamic and unpredictable environments. However, a major challenge remains ensuring safe and efficient navigation in environments filled with both dynamic alive (e.g., humans) and dynamic inanimate (e.g., non-living objects) obstacles. In this paper, we propose ImpedanceGPT, a novel system that combines a Vision-Language Mode… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Submitted to IROS 2025

  46. arXiv:2503.01916  [pdf, other

    quant-ph cs.CV cs.RO eess.IV

    QDCNN: Quantum Deep Learning for Enhancing Safety and Reliability in Autonomous Transportation Systems

    Authors: Ashtakala Meghanath, Subham Das, Bikash K. Behera, Muhammad Attique Khan, Saif Al-Kuwari, Ahmed Farouk

    Abstract: In transportation cyber-physical systems (CPS), ensuring safety and reliability in real-time decision-making is essential for successfully deploying autonomous vehicles and intelligent transportation networks. However, these systems face significant challenges, such as computational complexity and the ability to handle ambiguous inputs like shadows in complex environments. This paper introduces a… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

    Comments: 11 Pages, 7 Figures, 4 Tables

  47. arXiv:2503.00781  [pdf

    cs.IR cs.AI

    Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks

    Authors: Umar Ali Khan, Ekram Khan, Fiza Khan, Athar Ali Moinuddin

    Abstract: Large Language Models (LLMs) have proven immensely beneficial in education by capturing vast amounts of literature-based information, allowing them to generate context without relying on external sources. In this paper, we propose a generative AI-powered GATE question-answering framework (GATE stands for Graduate Aptitude Test in Engineering) that leverages LLMs to explain GATE solutions and suppo… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  48. arXiv:2503.00323  [pdf, other

    cs.LG cs.AI cs.DC

    FLStore: Efficient Federated Learning Storage for non-training workloads

    Authors: Ahmad Faraz Khan, Samuel Fountain, Ahmed M. Abdelmoniem, Ali R. Butt, Ali Anwar

    Abstract: Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. With an aggregator server coordinating training, aggregating model updates, and storing metadata across rounds. In addition to training, a substantial part of FL systems are the non-training workloads such as scheduling, personali… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: 11 pages, 19 figures, 2 tables This paper has been accepted at the The Eighth Annual Conference on Machine Learning and Systems (MLSys 2025)

  49. arXiv:2502.20825  [pdf, other

    cs.LG cs.AI cs.DC cs.SE

    LADs: Leveraging LLMs for AI-Driven DevOps

    Authors: Ahmad Faraz Khan, Azal Ahmad Khan, Anas Mohamed, Haider Ali, Suchithra Moolinti, Sabaat Haroon, Usman Tahir, Mattia Fazzini, Ali R. Butt, Ali Anwar

    Abstract: Automating cloud configuration and deployment remains a critical challenge due to evolving infrastructures, heterogeneous hardware, and fluctuating workloads. Existing solutions lack adaptability and require extensive manual tuning, leading to inefficiencies and misconfigurations. We introduce LADs, the first LLM-driven framework designed to tackle these challenges by ensuring robustness, adaptabi… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 17 pages with Appendix, 8 figures, and 7 tables. This paper is currently Under Review

  50. arXiv:2502.15867  [pdf

    q-bio.OT cs.AI

    Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligence

    Authors: Yingying Sun, Jun A, Zhiwei Liu, Rui Sun, Liujia Qian, Samuel H. Payne, Wout Bittremieux, Markus Ralser, Chen Li, Yi Chen, Zhen Dong, Yasset Perez-Riverol, Asif Khan, Chris Sander, Ruedi Aebersold, Juan Antonio Vizcaíno, Jonathan R Krieger, Jianhua Yao, Han Wen, Linfeng Zhang, Yunping Zhu, Yue Xuan, Benjamin Boyang Sun, Liang Qiao, Henning Hermjakob , et al. (37 additional authors not shown)

    Abstract: Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights.… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: 28 pages, 2 figures, perspective in AI proteomics