Search | arXiv e-print repository

arXiv:2506.20685 [pdf, ps, other]

Progressive Size-Adaptive Federated Learning: A Comprehensive Framework for Heterogeneous Multi-Modal Data Systems

Authors: Sajid Hussain, Muhammad Sohail, Nauman Ali Khan, Naima Iltaf, Ihtesham ul Islam

Abstract: Federated Learning (FL) has emerged as a transformative paradigm for distributed machine learning while preserving data privacy. However, existing approaches predominantly focus on model heterogeneity and aggregation techniques, largely overlooking the fundamental impact of dataset size characteristics on federated training dynamics. This paper introduces Size-Based Adaptive Federated Learning (SA… ▽ More Federated Learning (FL) has emerged as a transformative paradigm for distributed machine learning while preserving data privacy. However, existing approaches predominantly focus on model heterogeneity and aggregation techniques, largely overlooking the fundamental impact of dataset size characteristics on federated training dynamics. This paper introduces Size-Based Adaptive Federated Learning (SAFL), a novel progressive training framework that systematically organizes federated learning based on dataset size characteristics across heterogeneous multi-modal data. Our comprehensive experimental evaluation across 13 diverse datasets spanning 7 modalities (vision, text, time series, audio, sensor, medical vision, and multimodal) reveals critical insights: 1) an optimal dataset size range of 1000-1500 samples for federated learning effectiveness; 2) a clear modality performance hierarchy with structured data (time series, sensor) significantly outperforming unstructured data (text, multimodal); and 3) systematic performance degradation for large datasets exceeding 2000 samples. SAFL achieves an average accuracy of 87.68% across all datasets, with structured data modalities reaching 99%+ accuracy. The framework demonstrates superior communication efficiency, reducing total data transfer to 7.38 GB across 558 communications while maintaining high performance. Our real-time monitoring framework provides unprecedented insights into system resource utilization, network efficiency, and training dynamics. This work fills critical gaps in understanding how data characteristics should drive federated learning strategies, providing both theoretical insights and practical guidance for real-world FL deployments in neural network and learning systems. △ Less

Submitted 24 June, 2025; originally announced June 2025.

arXiv:2506.15911 [pdf, ps, other]

From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents

Authors: Mohammad Amaan Sayeed, Mohammed Talha Alam, Raza Imam, Shahab Saquib Sohail, Amir Hussain

Abstract: Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at… ▽ More Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at scale. We propose a unified evaluation pipeline, Tibbe-AG, that aligns 30 carefully curated Prophetic-medicine questions with human-verified remedies and compares three LLMs (LLaMA-3, Mistral-7B, Qwen2-7B) under three configurations: direct generation, retrieval-augmented generation, and a scientific self-critique filter. Each answer is then assessed by a secondary LLM serving as an agentic judge, yielding a single 3C3H quality score. Retrieval improves factual accuracy by 13%, while the agentic prompt adds another 10% improvement through deeper mechanistic insight and safety considerations. Our results demonstrate that blending classical Islamic texts with retrieval and self-evaluation enables reliable, culturally sensitive medical question-answering. △ Less

Submitted 22 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

Comments: Published at the 4th Muslims in Machine Learning (MusIML) Workshop (ICML-25)

arXiv:2506.06281 [pdf, other]

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Authors: Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan, Salman Khan

Abstract: Modern Earth observation (EO) increasingly leverages deep learning to harness the scale and diversity of satellite imagery across sensors and regions. While recent foundation models have demonstrated promising generalization across EO tasks, many remain limited by the scale, geographical coverage, and spectral diversity of their training data, factors critical for learning globally transferable re… ▽ More Modern Earth observation (EO) increasingly leverages deep learning to harness the scale and diversity of satellite imagery across sensors and regions. While recent foundation models have demonstrated promising generalization across EO tasks, many remain limited by the scale, geographical coverage, and spectral diversity of their training data, factors critical for learning globally transferable representations. In this work, we introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery, combined with large spatial tiles and land-cover aware sampling to enrich spatial and semantic coverage. By treating sensing modalities as natural augmentations in our self-supervised approach, we unify radar and optical inputs via modality-specific patch embeddings and adaptive cross-attention fusion. Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism that incorporates class-frequency-aware regularization to address long-tailed distributions in land cover.TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench. Our code and pretrained models are publicly available at: https://github.com/mbzuai-oryx/TerraFM . △ Less

Submitted 6 June, 2025; originally announced June 2025.

arXiv:2506.05411 [pdf, ps, other]

QA-HFL: Quality-Aware Hierarchical Federated Learning for Resource-Constrained Mobile Devices with Heterogeneous Image Quality

Authors: Sajid Hussain, Muhammad Sohail, Nauman Ali Khan

Abstract: This paper introduces QA-HFL, a quality-aware hierarchical federated learning framework that efficiently handles heterogeneous image quality across resource-constrained mobile devices. Our approach trains specialized local models for different image quality levels and aggregates their features using a quality-weighted fusion mechanism, while incorporating differential privacy protection. Experimen… ▽ More This paper introduces QA-HFL, a quality-aware hierarchical federated learning framework that efficiently handles heterogeneous image quality across resource-constrained mobile devices. Our approach trains specialized local models for different image quality levels and aggregates their features using a quality-weighted fusion mechanism, while incorporating differential privacy protection. Experiments on MNIST demonstrate that QA-HFL achieves 92.31% accuracy after just three federation rounds, significantly outperforming state-of-the-art methods like FedRolex (86.42%). Under strict privacy constraints, our approach maintains 30.77% accuracy with formal differential privacy guarantees. Counter-intuitively, low-end devices contributed most significantly (63.5%) to the final model despite using 100 fewer parameters than high-end counterparts. Our quality-aware approach addresses accuracy decline through device-specific regularization, adaptive weighting, intelligent client selection, and server-side knowledge distillation, while maintaining efficient communication with a 4.71% compression ratio. Statistical analysis confirms that our approach significantly outperforms baseline methods (p 0.01) under both standard and privacy-constrained conditions. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2506.01882 [pdf, ps, other]

Learning thermodynamic master equations for open quantum systems

Authors: Peter Sentz, Stanley Nicholson, Yujin Cho, Sohail Reddy, Brendan Keith, Stefanie Günther

Abstract: The characterization of Hamiltonians and other components of open quantum dynamical systems plays a crucial role in quantum computing and other applications. Scientific machine learning techniques have been applied to this problem in a variety of ways, including by modeling with deep neural networks. However, the majority of mathematical models describing open quantum systems are linear, and the n… ▽ More The characterization of Hamiltonians and other components of open quantum dynamical systems plays a crucial role in quantum computing and other applications. Scientific machine learning techniques have been applied to this problem in a variety of ways, including by modeling with deep neural networks. However, the majority of mathematical models describing open quantum systems are linear, and the natural nonlinearities in learnable models have not been incorporated using physical principles. We present a data-driven model for open quantum systems that includes learnable, thermodynamically consistent terms. The trained model is interpretable, as it directly estimates the system Hamiltonian and linear components of coupling to the environment. We validate the model on synthetic two and three-level data, as well as experimental two-level data collected from a quantum device at Lawrence Livermore National Laboratory. △ Less

Submitted 2 June, 2025; originally announced June 2025.

Comments: 20 pages, 7 figures

Report number: LLNL-JRNL-2005344 ACM Class: I.2.6; J.2

arXiv:2505.24216 [pdf, ps, other]

Shuffle PatchMix Augmentation with Confidence-Margin Weighted Pseudo-Labels for Enhanced Source-Free Domain Adaptation

Authors: Prasanna Reddy Pulakurthi, Majid Rabbani, Jamison Heard, Sohail Dianat, Celso M. de Melo, Raghuveer Rao

Abstract: This work investigates Source-Free Domain Adaptation (SFDA), where a model adapts to a target domain without access to source data. A new augmentation technique, Shuffle PatchMix (SPM), and a novel reweighting strategy are introduced to enhance performance. SPM shuffles and blends image patches to generate diverse and challenging augmentations, while the reweighting strategy prioritizes reliable p… ▽ More This work investigates Source-Free Domain Adaptation (SFDA), where a model adapts to a target domain without access to source data. A new augmentation technique, Shuffle PatchMix (SPM), and a novel reweighting strategy are introduced to enhance performance. SPM shuffles and blends image patches to generate diverse and challenging augmentations, while the reweighting strategy prioritizes reliable pseudo-labels to mitigate label noise. These techniques are particularly effective on smaller datasets like PACS, where overfitting and pseudo-label noise pose greater risks. State-of-the-art results are achieved on three major benchmarks: PACS, VisDA-C, and DomainNet-126. Notably, on PACS, improvements of 7.3% (79.4% to 86.7%) and 7.2% are observed in single-target and multi-target settings, respectively, while gains of 2.8% and 0.7% are attained on DomainNet-126 and VisDA-C. This combination of advanced augmentation and robust pseudo-label reweighting establishes a new benchmark for SFDA. The code is available at: https://github.com/PrasannaPulakurthi/SPM △ Less

Submitted 30 May, 2025; originally announced May 2025.

Comments: 6 pages, 3 figures, 5 tables, Accepted to IEEE ICIP 2025

arXiv:2505.23801 [pdf, ps, other]

SEMFED: Semantic-Aware Resource-Efficient Federated Learning for Heterogeneous NLP Tasks

Authors: Sajid Hussain, Muhammad Sohail, Nauman Ali Khan

Abstract: Background: Federated Learning (FL) has emerged as a promising paradigm for training machine learning models while preserving data privacy. However, applying FL to Natural Language Processing (NLP) tasks presents unique challenges due to semantic heterogeneity across clients, vocabulary mismatches, and varying resource constraints on edge devices. Objectives: This paper introduces SEMFED, a novel… ▽ More Background: Federated Learning (FL) has emerged as a promising paradigm for training machine learning models while preserving data privacy. However, applying FL to Natural Language Processing (NLP) tasks presents unique challenges due to semantic heterogeneity across clients, vocabulary mismatches, and varying resource constraints on edge devices. Objectives: This paper introduces SEMFED, a novel semantic-aware resource-efficient federated learning framework specifically designed for heterogeneous NLP tasks. Methods: SEMFED incorporates three key innovations: (1) a semantic-aware client selection mechanism that balances semantic diversity with resource constraints, (2) adaptive NLP-specific model architectures tailored to device capabilities while preserving semantic information, and (3) a communication-efficient semantic feature compression technique that significantly reduces bandwidth requirements. Results: Experimental results on various NLP classification tasks demonstrate that SEMFED achieves an 80.5% reduction in communication costs while maintaining model accuracy above 98%, outperforming state-of-the-art FL approaches. Conclusion: SEMFED effectively manages heterogeneous client environments with varying computational resources, network reliability, and semantic data distributions, making it particularly suitable for real-world federated NLP deployments. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: 13 pages

arXiv:2505.23792 [pdf, ps, other]

Zero-Trust Foundation Models: A New Paradigm for Secure and Collaborative Artificial Intelligence for Internet of Things

Authors: Kai Li, Conggai Li, Xin Yuan, Shenghong Li, Sai Zou, Syed Sohail Ahmed, Wei Ni, Dusit Niyato, Abbas Jamalipour, Falko Dressler, Ozgur B. Akan

Abstract: This paper focuses on Zero-Trust Foundation Models (ZTFMs), a novel paradigm that embeds zero-trust security principles into the lifecycle of foundation models (FMs) for Internet of Things (IoT) systems. By integrating core tenets, such as continuous verification, least privilege access (LPA), data confidentiality, and behavioral analytics into the design, training, and deployment of FMs, ZTFMs ca… ▽ More This paper focuses on Zero-Trust Foundation Models (ZTFMs), a novel paradigm that embeds zero-trust security principles into the lifecycle of foundation models (FMs) for Internet of Things (IoT) systems. By integrating core tenets, such as continuous verification, least privilege access (LPA), data confidentiality, and behavioral analytics into the design, training, and deployment of FMs, ZTFMs can enable secure, privacy-preserving AI across distributed, heterogeneous, and potentially adversarial IoT environments. We present the first structured synthesis of ZTFMs, identifying their potential to transform conventional trust-based IoT architectures into resilient, self-defending ecosystems. Moreover, we propose a comprehensive technical framework, incorporating federated learning (FL), blockchain-based identity management, micro-segmentation, and trusted execution environments (TEEs) to support decentralized, verifiable intelligence at the network edge. In addition, we investigate emerging security threats unique to ZTFM-enabled systems and evaluate countermeasures, such as anomaly detection, adversarial training, and secure aggregation. Through this analysis, we highlight key open research challenges in terms of scalability, secure orchestration, interpretable threat attribution, and dynamic trust calibration. This survey lays a foundational roadmap for secure, intelligent, and trustworthy IoT infrastructures powered by FMs. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.21811 [pdf, ps, other]

Revisiting Self-attention for Cross-domain Sequential Recommendation

Authors: Clark Mingxuan Ju, Leonardo Neves, Bhuvesh Kumar, Liam Collins, Tong Zhao, Yuwei Qiu, Qing Dou, Sohail Nizam, Sen Yang, Neil Shah

Abstract: Sequential recommendation is a popular paradigm in modern recommender systems. In particular, one challenging problem in this space is cross-domain sequential recommendation (CDSR), which aims to predict future behaviors given user interactions across multiple domains. Existing CDSR frameworks are mostly built on the self-attention transformer and seek to improve by explicitly injecting additional… ▽ More Sequential recommendation is a popular paradigm in modern recommender systems. In particular, one challenging problem in this space is cross-domain sequential recommendation (CDSR), which aims to predict future behaviors given user interactions across multiple domains. Existing CDSR frameworks are mostly built on the self-attention transformer and seek to improve by explicitly injecting additional domain-specific components (e.g. domain-aware module blocks). While these additional components help, we argue they overlook the core self-attention module already present in the transformer, a naturally powerful tool to learn correlations among behaviors. In this work, we aim to improve the CDSR performance for simple models from a novel perspective of enhancing the self-attention. Specifically, we introduce a Pareto-optimal self-attention and formulate the cross-domain learning as a multi-objective problem, where we optimize the recommendation task while dynamically minimizing the cross-domain attention scores. Our approach automates knowledge transfer in CDSR (dubbed as AutoCDSR) -- it not only mitigates negative transfer but also encourages complementary knowledge exchange among auxiliary domains. Based on the idea, we further introduce AutoCDSR+, a more performant variant with slight additional cost. Our proposal is easy to implement and works as a plug-and-play module that can be incorporated into existing transformer-based recommenders. Besides flexibility, it is practical to deploy because it brings little extra computational overheads without heavy hyper-parameter tuning. AutoCDSR on average improves Recall@10 for SASRec and Bert4Rec by 9.8% and 16.0% and NDCG@10 by 12.0% and 16.7%, respectively. Code is available at https://github.com/snap-research/AutoCDSR. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: Accepted to KDD'25

arXiv:2505.19851 [pdf]

Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages

Authors: Gulfarogh Azam, Mohd Sadique, Saif Ali, Mohammad Nadeem, Erik Cambria, Shahab Saquib Sohail, Mohammad Sultan Alam

Abstract: Transliteration, the process of mapping text from one script to another, plays a crucial role in multilingual natural language processing, especially within linguistically diverse contexts such as India. Despite significant advancements through specialized models like IndicXlit, recent developments in large language models suggest a potential for general-purpose models to excel at this task withou… ▽ More Transliteration, the process of mapping text from one script to another, plays a crucial role in multilingual natural language processing, especially within linguistically diverse contexts such as India. Despite significant advancements through specialized models like IndicXlit, recent developments in large language models suggest a potential for general-purpose models to excel at this task without explicit task-specific training. The current work systematically evaluates the performance of prominent LLMs, including GPT-4o, GPT-4.5, GPT-4.1, Gemma-3-27B-it, and Mistral-Large against IndicXlit, a state-of-the-art transliteration model, across ten major Indian languages. Experiments utilized standard benchmarks, including Dakshina and Aksharantar datasets, with performance assessed via Top-1 Accuracy and Character Error Rate. Our findings reveal that while GPT family models generally outperform other LLMs and IndicXlit for most instances. Additionally, fine-tuning GPT-4o improves performance on specific languages notably. An extensive error analysis and robustness testing under noisy conditions further elucidate strengths of LLMs compared to specialized models, highlighting the efficacy of foundational models for a wide spectrum of specialized applications with minimal overhead. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.08056 [pdf, other]

QubitLens: An Interactive Learning Tool for Quantum State Tomography

Authors: Mohammad Aamir Sohail, R. Ranga Sudharshan, S. Sandeep Pradhan, Arvind Rao

Abstract: Quantum state tomography is a fundamental task in quantum computing, involving the reconstruction of an unknown quantum state from measurement outcomes. Although essential, it is typically introduced at the graduate level due to its reliance on advanced concepts such as the density matrix formalism, tensor product structures, and partial trace operations. This complexity often creates a barrier fo… ▽ More Quantum state tomography is a fundamental task in quantum computing, involving the reconstruction of an unknown quantum state from measurement outcomes. Although essential, it is typically introduced at the graduate level due to its reliance on advanced concepts such as the density matrix formalism, tensor product structures, and partial trace operations. This complexity often creates a barrier for students and early learners. In this work, we introduce QubitLens, an interactive visualization tool designed to make quantum state tomography more accessible and intuitive. QubitLens leverages maximum likelihood estimation (MLE), a classical statistical method, to estimate pure quantum states from projective measurement outcomes in the X, Y, and Z bases. The tool emphasizes conceptual clarity through visual representations, including Bloch sphere plots of true and reconstructed qubit states, bar charts comparing parameter estimates, and fidelity gauges that quantify reconstruction accuracy. QubitLens offers a hands-on approach to learning quantum tomography without requiring deep prior knowledge of density matrices or optimization theory. The tool supports both single- and multi-qubit systems and is intended to bridge the gap between theory and practice in quantum computing education. △ Less

Submitted 12 May, 2025; originally announced May 2025.

Comments: 7 pages, 5 figures

arXiv:2505.03409 [pdf]

Advancing Remote and Continuous Cardiovascular Patient Monitoring through a Novel and Resource-efficient IoT-Driven Framework

Authors: Sanam Nayab, Sohail Raza Chohan, Aqsa Jameel, Syed Rehan Shah, Syed Ahsan Masud Zaidi, Aditya Nath Jha, Kamran Siddique

Abstract: Cardiovascular diseases are a leading cause of fatalities worldwide, often occurring suddenly with limited time for intervention. Current healthcare monitoring systems for cardiac patients rely heavily on hospitalization, which can be impractical for continuous monitoring. This paper presents a novel IoT-based solution for remote, real-time tracking of critical cardiac metrics, addressing the pres… ▽ More Cardiovascular diseases are a leading cause of fatalities worldwide, often occurring suddenly with limited time for intervention. Current healthcare monitoring systems for cardiac patients rely heavily on hospitalization, which can be impractical for continuous monitoring. This paper presents a novel IoT-based solution for remote, real-time tracking of critical cardiac metrics, addressing the pressing need for accessible and continuous healthcare, particularly for the aging population in Pakistan. The proposed IoT kit measures essential parameters such as body temperature, heart rate (HR), blood pressure (BP), oxygen saturation (SPO2), and electrocardiography (ECG). A key innovation of the system is its integration with a cloud-based application, enabling constant remote monitoring and incorporating an alarm mechanism to alert medical professionals for timely intervention, reducing the risk of catastrophic incidents. The system was tested in a clinical environment with 20 participants, demonstrating results closely aligned with those obtained using standard medical devices. The findings validate the system's potential for reliable remote monitoring, offering a significant step forward in proactive cardiac healthcare management. This novel approach combines IoT technology with cloud-based applications to provide a cost-effective and efficient solution for reducing unexpected fatalities among cardiac patients. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 20 pages, and 8063 words and 14 figures

arXiv:2505.03406 [pdf, other]

Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation

Authors: Mohammad Shoaib Ansari, Mohd Sohail Ali Khan, Shubham Revankar, Aditya Varma, Anil S. Mokhade

Abstract: This research paper investigates the application of Large Language Models (LLMs) in healthcare, specifically focusing on enhancing medical decision support through Retrieval-Augmented Generation (RAG) integrated with hospital-specific data and fine-tuning using Quantized Low-Rank Adaptation (QLoRA). The system utilizes Llama 3.2-3B-Instruct as its foundation model. By embedding and retrieving cont… ▽ More This research paper investigates the application of Large Language Models (LLMs) in healthcare, specifically focusing on enhancing medical decision support through Retrieval-Augmented Generation (RAG) integrated with hospital-specific data and fine-tuning using Quantized Low-Rank Adaptation (QLoRA). The system utilizes Llama 3.2-3B-Instruct as its foundation model. By embedding and retrieving context-relevant healthcare information, the system significantly improves response accuracy. QLoRA facilitates notable parameter efficiency and memory optimization, preserving the integrity of medical information through specialized quantization techniques. Our research also shows that our model performs relatively well on various medical benchmarks, indicating that it can be used to make basic medical suggestions. This paper details the system's technical components, including its architecture, quantization methods, and key healthcare applications such as enhanced disease prediction from patient symptoms and medical history, treatment suggestions, and efficient summarization of complex medical reports. We touch on the ethical considerations-patient privacy, data security, and the need for rigorous clinical validation-as well as the practical challenges of integrating such systems into real-world healthcare workflows. Furthermore, the lightweight quantized weights ensure scalability and ease of deployment even in low-resource hospital environments. Finally, the paper concludes with an analysis of the broader impact of LLMs on healthcare and outlines future directions for LLMs in medical settings. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 12 pages

arXiv:2505.01831 [pdf, other]

Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement

Authors: Haofan Wu, Yin Huang, Yuqing Wu, Qiuyu Yang, Bingfang Wang, Li Zhang, Muhammad Fahadullah Khan, Ali Zia, M. Saleh Memon, Syed Sohail Bukhari, Abdul Fattah Memon, Daizong Ji, Ya Zhang, Ghulam Mustafa, Yin Fang

Abstract: High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on r… ▽ More High-quality fundus images provide essential anatomical information for clinical screening and ophthalmic disease diagnosis. Yet, due to hardware limitations, operational variability, and patient compliance, fundus images often suffer from low resolution and signal-to-noise ratio. Recent years have witnessed promising progress in fundus image enhancement. However, existing works usually focus on restoring structural details or global characteristics of fundus images, lacking a unified image enhancement framework to recover comprehensive multi-scale information. Moreover, few methods pinpoint the target of image enhancement, e.g., lesions, which is crucial for medical image-based diagnosis. To address these challenges, we propose a multi-scale target-aware representation learning framework (MTRL-FIE) for efficient fundus image enhancement. Specifically, we propose a multi-scale feature encoder (MFE) that employs wavelet decomposition to embed both low-frequency structural information and high-frequency details. Next, we design a structure-preserving hierarchical decoder (SHD) to fuse multi-scale feature embeddings for real fundus image restoration. SHD integrates hierarchical fusion and group attention mechanisms to achieve adaptive feature fusion while retaining local structural smoothness. Meanwhile, a target-aware feature aggregation (TFA) module is used to enhance pathological regions and reduce artifacts. Experimental results on multiple fundus image datasets demonstrate the effectiveness and generalizability of MTRL-FIE for fundus image enhancement. Compared to state-of-the-art methods, MTRL-FIE achieves superior enhancement performance with a more lightweight architecture. Furthermore, our approach generalizes to other ophthalmic image processing tasks without supervised fine-tuning, highlighting its potential for clinical applications. △ Less

Submitted 3 May, 2025; originally announced May 2025.

Comments: Under review at Neural Networks

arXiv:2504.21838 [pdf, ps, other]

Learning Universal User Representations Leveraging Cross-domain User Intent at Snapchat

Authors: Clark Mingxuan Ju, Leonardo Neves, Bhuvesh Kumar, Liam Collins, Tong Zhao, Yuwei Qiu, Qing Dou, Yang Zhou, Sohail Nizam, Rengim Ozturk, Yvette Liu, Sen Yang, Manish Malik, Neil Shah

Abstract: The development of powerful user representations is a key factor in the success of recommender systems (RecSys). Online platforms employ a range of RecSys techniques to personalize user experience across diverse in-app surfaces. User representations are often learned individually through user's historical interactions within each surface and user representations across different surfaces can be sh… ▽ More The development of powerful user representations is a key factor in the success of recommender systems (RecSys). Online platforms employ a range of RecSys techniques to personalize user experience across diverse in-app surfaces. User representations are often learned individually through user's historical interactions within each surface and user representations across different surfaces can be shared post-hoc as auxiliary features or additional retrieval sources. While effective, such schemes cannot directly encode collaborative filtering signals across different surfaces, hindering its capacity to discover complex relationships between user behaviors and preferences across the whole platform. To bridge this gap at Snapchat, we seek to conduct universal user modeling (UUM) across different in-app surfaces, learning general-purpose user representations which encode behaviors across surfaces. Instead of replacing domain-specific representations, UUM representations capture cross-domain trends, enriching existing representations with complementary information. This work discusses our efforts in developing initial UUM versions, practical challenges, technical choices and modeling and research directions with promising offline performance. Following successful A/B testing, UUM representations have been launched in production, powering multiple use cases and demonstrating their value. UUM embedding has been incorporated into (i) Long-form Video embedding-based retrieval, leading to 2.78% increase in Long-form Video Open Rate, (ii) Long-form Video L2 ranking, with 19.2% increase in Long-form Video View Time sum, (iii) Lens L2 ranking, leading to 1.76% increase in Lens play time, and (iv) Notification L2 ranking, with 0.87% increase in Notification Open Rate. △ Less

Submitted 9 June, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

Comments: Accepted to the industrial track of SIGIR'25

arXiv:2504.18856 [pdf, other]

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation

Authors: Shahad Albastaki, Anabia Sohail, Iyyakutti Iyappan Ganapathi, Basit Alawode, Asim Khan, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, Arif Mahmood

Abstract: In Computational Pathology (CPath), the introduction of Vision-Language Models (VLMs) has opened new avenues for research, focusing primarily on aligning image-text pairs at a single magnification level. However, this approach might not be sufficient for tasks like cancer subtype classification, tissue phenotyping, and survival analysis due to the limited level of detail that a single-resolution i… ▽ More In Computational Pathology (CPath), the introduction of Vision-Language Models (VLMs) has opened new avenues for research, focusing primarily on aligning image-text pairs at a single magnification level. However, this approach might not be sufficient for tasks like cancer subtype classification, tissue phenotyping, and survival analysis due to the limited level of detail that a single-resolution image can provide. Addressing this, we propose a novel multi-resolution paradigm leveraging Whole Slide Images (WSIs) to extract histology patches at multiple resolutions and generate corresponding textual descriptions through advanced CPath VLM. We introduce visual-textual alignment at multiple resolutions as well as cross-resolution alignment to establish more effective text-guided visual representations. Cross-resolution alignment using a multimodal encoder enhances the model's ability to capture context from multiple resolutions in histology images. Our model aims to capture a broader range of information, supported by novel loss functions, enriches feature representation, improves discriminative ability, and enhances generalization across different resolutions. Pre-trained on a comprehensive TCGA dataset with 34 million image-language pairs at various resolutions, our fine-tuned model outperforms state-of-the-art (SOTA) counterparts across multiple datasets and tasks, demonstrating its effectiveness in CPath. The code is available on GitHub at: https://github.com/BasitAlawode/MR-PLIP △ Less

Submitted 26 April, 2025; originally announced April 2025.

arXiv:2504.15310 [pdf]

doi 10.1016/j.engappai.2024.109474

Power Transformer Health Index and Life Span Assessment: A Comprehensive Review of Conventional and Machine Learning based Approaches

Authors: Syeda Tahreem Zahra, Syed Kashif Imdad, Sohail Khan, Sohail Khalid, Nauman Anwar Baig

Abstract: Power transformers play a critical role within the electrical power system, making their health assessment and the prediction of their remaining lifespan paramount for the purpose of ensuring efficient operation and facilitating effective maintenance planning. This paper undertakes a comprehensive examination of existent literature, with a primary focus on both conventional and cutting-edge techni… ▽ More Power transformers play a critical role within the electrical power system, making their health assessment and the prediction of their remaining lifespan paramount for the purpose of ensuring efficient operation and facilitating effective maintenance planning. This paper undertakes a comprehensive examination of existent literature, with a primary focus on both conventional and cutting-edge techniques employed within this domain. The merits and demerits of recent methodologies and techniques are subjected to meticulous scrutiny and explication. Furthermore, this paper expounds upon intelligent fault diagnosis methodologies and delves into the most widely utilized intelligent algorithms for the assessment of transformer conditions. Diverse Artificial Intelligence (AI) approaches, including Artificial Neural Networks (ANN) and Convolutional Neural Network (CNN), Support Vector Machine (SVM), Random Forest (RF), Genetic Algorithm (GA), and Particle Swarm Optimization (PSO), are elucidated offering pragmatic solutions for enhancing the performance of transformer fault diagnosis. The amalgamation of multiple AI methodologies and the exploration of timeseries analysis further contribute to the augmentation of diagnostic precision and the early detection of faults in transformers. By furnishing a comprehensive panorama of AI applications in the field of transformer fault diagnosis, this study lays the groundwork for future research endeavors and the progression of this critical area of study. △ Less

Submitted 19 April, 2025; originally announced April 2025.

arXiv:2504.13240 [pdf, other]

Response to recent comments on Phys. Rev. B 107, 245423 (2023) and Subsection S4.3 of the Supp. Info. for Nature 638, 651-655 (2025)

Authors: Morteza Aghaee, Zulfi Alam, Mariusz Andrzejczuk, Andrey E. Antipov, Mikhail Astafev, Amin Barzegar, Bela Bauer, Jonathan Becker, Umesh Kumar Bhaskar, Alex Bocharov, Srini Boddapati, David Bohn, Jouri Bommer, Leo Bourdet, Samuel Boutin, Benjamin J. Chapman, Sohail Chatoor, Anna Wulff Christensen, Patrick Codd, William S. Cole, Paul Cooper, Fabiano Corsetti, Ajuan Cui, Andreas Ekefjärd, Saeed Fallahi , et al. (105 additional authors not shown)

Abstract: The topological gap protocol (TGP) is a statistical test designed to identify a topological phase with high confidence and without human bias. It is used to determine a promising parameter regime for operating topological qubits. The protocol's key metric is the probability of incorrectly identifying a trivial region as topological, referred to as the false discovery rate (FDR). Two recent manuscr… ▽ More The topological gap protocol (TGP) is a statistical test designed to identify a topological phase with high confidence and without human bias. It is used to determine a promising parameter regime for operating topological qubits. The protocol's key metric is the probability of incorrectly identifying a trivial region as topological, referred to as the false discovery rate (FDR). Two recent manuscripts [arXiv:2502.19560, arXiv:2503.08944] engage with the topological gap protocol and its use in Phys. Rev. B 107, 245423 (2023) and Subsection S4.3 of the Supplementary Information for Nature 638, 651-655 (2025), although they do not explicitly dispute the main results of either one. We demonstrate that the objections in arXiv:2502.19560 and arXiv:2503.08944 are unfounded, and we uphold the conclusions of Phys. Rev. B 107, 245423 (2023) and Nature 638, 651-655 (2025). Specifically, we show that no flaws have been identified in our estimate of the false discovery rate (FDR). We provide a point-by-point rebuttal of the comments in arXiv:2502.19560 and arXiv:2503.08944. △ Less

Submitted 17 April, 2025; originally announced April 2025.

Comments: Response to arXiv:2502.19560 and arXiv:2503.08944. 11 pages, 5 figures, 2 tables, code for reproduction

arXiv:2504.13077 [pdf, ps, other]

doi 10.1117/12.3058627

Effective Dual-Region Augmentation for Reduced Reliance on Large Amounts of Labeled Data

Authors: Prasanna Reddy Pulakurthi, Majid Rabbani, Celso M. de Melo, Sohail A. Dianat, Raghuveer M. Rao

Abstract: This paper introduces a novel dual-region augmentation approach designed to reduce reliance on large-scale labeled datasets while improving model robustness and adaptability across diverse computer vision tasks, including source-free domain adaptation (SFDA) and person re-identification (ReID). Our method performs targeted data transformations by applying random noise perturbations to foreground o… ▽ More This paper introduces a novel dual-region augmentation approach designed to reduce reliance on large-scale labeled datasets while improving model robustness and adaptability across diverse computer vision tasks, including source-free domain adaptation (SFDA) and person re-identification (ReID). Our method performs targeted data transformations by applying random noise perturbations to foreground objects and spatially shuffling background patches. This effectively increases the diversity of the training data, improving model robustness and generalization. Evaluations on the PACS dataset for SFDA demonstrate that our augmentation strategy consistently outperforms existing methods, achieving significant accuracy improvements in both single-target and multi-target adaptation settings. By augmenting training data through structured transformations, our method enables model generalization across domains, providing a scalable solution for reducing reliance on manually annotated datasets. Furthermore, experiments on Market-1501 and DukeMTMC-reID datasets validate the effectiveness of our approach for person ReID, surpassing traditional augmentation techniques. The code is available at https://github.com/PrasannaPulakurthi/Foreground-Background-Augmentation △ Less

Submitted 3 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: 9 pages, 2 figures, 4 tables, Accepted to SPIE DSC 2025 Conference: Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications III

Journal ref: Proc. SPIE 13459, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications III, 134590I (2025)

arXiv:2504.11535 [pdf, other]

Generating three transparency windows, Fano-resononce and slow/fast light in magnomechanical system through an auxiliary microwave cavity

Authors: M'bark Amghar, Noura Chabar, Mohamed Amazioug, Amjad Sohail Shah

Abstract: In this paper, we propose a theoretical scheme to investigate the magnomechanically induced transparency (MMIT) phenomenon, Fano resonances, and slow/fast light effects in a hybrid cavity magnomechanical system. The magnomechanical system consists of two cavities: the principal cavity contains two ferromagnetic yttrium iron garnet (YIG) spheres, and the auxiliary cavity contains an atomic assembly… ▽ More In this paper, we propose a theoretical scheme to investigate the magnomechanically induced transparency (MMIT) phenomenon, Fano resonances, and slow/fast light effects in a hybrid cavity magnomechanical system. The magnomechanical system consists of two cavities: the principal cavity contains two ferromagnetic yttrium iron garnet (YIG) spheres, and the auxiliary cavity contains an atomic assembly. These two cavities are connected via photon tunneling, with the principal cavity being driven by two electromagnetic fields. The photon-magnon and phonon-magnon couplings are responsible for the magnon-induced transparency (MIT) and MMIT observed in the probe output spectrum. Furthermore, we examine the impacts of tunneling coupling, atom-photon coupling, and the magnetic field on the absorption, dispersion, and transmission spectra. We provide an explanation of the mechanism behind the Fano resonance phenomenon. Additionally, we address the phenomenon of slow and light propagation. Moreover, we demonstrate that group delay of the probe field can be improved by increasing photon tunneling strength. We also show that the slow light profile is decreased by adjusting the atom-photon coupling strength. This model is experimentally feasible. We hope these findings have the potential to be applied to the processing of quantum information and communication. △ Less

Submitted 15 April, 2025; originally announced April 2025.

arXiv:2503.22147 [pdf, other]

Characterizing Non-Markovian Dynamics of Open Quantum Systems

Authors: Sohail Reddy

Abstract: Characterizing non-Markovian quantum dynamics is essential for accurately modeling open quantum systems, particularly in near-term quantum technologies. In this work, we develop a structure-preserving approach to characterizing non-Markovian evolution using the time-convolutionless (TCL) master equation, considering both linear and nonlinear formulations. To parameterize the master equation, we ex… ▽ More Characterizing non-Markovian quantum dynamics is essential for accurately modeling open quantum systems, particularly in near-term quantum technologies. In this work, we develop a structure-preserving approach to characterizing non-Markovian evolution using the time-convolutionless (TCL) master equation, considering both linear and nonlinear formulations. To parameterize the master equation, we explore two distinct techniques: the Karhunen-Loeve (KL) expansion, which provides an optimal basis representation of the dynamics, and neural networks, which offer a data-driven approach to learning system-environment interactions. We demonstrate our methodology using experimental data from a superconducting qubit at the Quantum Device Integration Testbed (QuDIT) at Lawrence Livermore National Laboratory (LLNL). Our results show that while neural networks can capture complex dependencies, the KL expansion yields the most accurate predictions of the qubit's non-Markovian dynamics, highlighting its effectiveness in structure-preserving quantum system characterization. These findings provide valuable insights into efficient modeling strategies for open quantum systems, with implications for quantum control and error mitigation in near-term quantum processors. △ Less

Submitted 28 March, 2025; originally announced March 2025.

Report number: LLNL-JRNL-2003782

arXiv:2503.17300 [pdf, ps, other]

Variational Tail Bounds for Norms of Random Vectors and Matrices

Authors: Sohail Bahmani

Abstract: We propose a variational tail bound for norms of random vectors under moment assumptions on their one-dimensional marginals. We also propose a simplified version of the bound that parametrizes the ``aggregating'' distribution in the proposed variational bound by considering a certain pushforward of the Gaussian distribution. Furthermore, we show that the proposed method recovers some of the well-k… ▽ More We propose a variational tail bound for norms of random vectors under moment assumptions on their one-dimensional marginals. We also propose a simplified version of the bound that parametrizes the ``aggregating'' distribution in the proposed variational bound by considering a certain pushforward of the Gaussian distribution. Furthermore, we show that the proposed method recovers some of the well-known bounds on norms of Gaussian random vectors, as well as a recent concentration inequality for the spectral norm of sum of independent and identically distributed positive semidefinite matrices. △ Less

Submitted 21 March, 2025; originally announced March 2025.

arXiv:2503.13830 [pdf, other]

Hierarchical Gaussian Random Fields for Multilevel Markov Chain Monte Carlo: Coupling Stochastic Partial Differential Equation and The Karhunen-Loève Decomposition

Authors: Sohail Reddy

Abstract: This work introduces structure preserving hierarchical decompositions for sampling Gaussian random fields (GRF) within the context of multilevel Bayesian inference in high-dimensional space. Existing scalable hierarchical sampling methods, such as those based on stochastic partial differential equation (SPDE), often reduce the dimensionality of the sample space at the cost of accuracy of inference… ▽ More This work introduces structure preserving hierarchical decompositions for sampling Gaussian random fields (GRF) within the context of multilevel Bayesian inference in high-dimensional space. Existing scalable hierarchical sampling methods, such as those based on stochastic partial differential equation (SPDE), often reduce the dimensionality of the sample space at the cost of accuracy of inference. Other approaches, such that those based on Karhunen-Loève (KL) expansions, offer sample space dimensionality reduction but sacrifice GRF representation accuracy and ergodicity of the Markov Chain Monte Carlo (MCMC) sampler, and are computationally expensive for high-dimensional problems. The proposed method integrates the dimensionality reduction capabilities of KL expansions with the scalability of stochastic partial differential equation (SPDE)-based sampling, thereby providing a robust, unified framework for high-dimensional uncertainty quantification (UQ) that is scalable, accurate, preserves ergodicity, and offers dimensionality reduction of the sample space. The hierarchy in our multilevel algorithm is derived from the geometric multigrid hierarchy. By constructing a hierarchical decomposition that maintains the covariance structure across the levels in the hierarchy, the approach enables efficient coarse-to-fine sampling while ensuring that all samples are drawn from the desired distribution. The effectiveness of the proposed method is demonstrated on a benchmark subsurface flow problem, demonstrating its effectiveness in improving computational efficiency and statistical accuracy. Our proposed technique is more efficient, accurate, and displays better convergence properties than existing methods for high-dimensional Bayesian inference problems. △ Less

Submitted 17 March, 2025; originally announced March 2025.

arXiv:2503.09751 [pdf, ps, other]

Light Drag in a Cavity Magnomechanics

Authors: Amjad Sohail, Hazrat Ali, Khalid Naseer, Rizwan Ahmed

Abstract: The term "light dragging" describes how the trajectory of light changes as it travels through a moving medium. This phenomenon facilitates the precise detection of incredibly slow speeds of light, which is widely used in quantum gate operations, state transfer, and quantum memory implementations, etc. To the best of our knowledge, this is the first time we have proposed the existence of a light-dr… ▽ More The term "light dragging" describes how the trajectory of light changes as it travels through a moving medium. This phenomenon facilitates the precise detection of incredibly slow speeds of light, which is widely used in quantum gate operations, state transfer, and quantum memory implementations, etc. To the best of our knowledge, this is the first time we have proposed the existence of a light-dragging effect in a magnomechanical system (MMS). The origin of this crucial element stems from nonlinear dipole and magnetostrictive interactions in MMS. Magnomechanical characteristics such as magnon-photon and magnon-phonon couplings have a strong impact on both refractive and group index profile spectra. We also explore that lateral light drag shows a strong dependence on detuning by altering the amplitude and direction of the translational velocity. This enabled us to alter the light's propagation within the magnomechanical system from superluminal to subluminal and vice versa by adjusting the probe's detuning. The ability to control and manipulate the light drag through the MMS could be helpful in designing novel devices with improved functionality at the microscopic scale. △ Less

Submitted 12 March, 2025; originally announced March 2025.

Comments: 10 pages, 7 figures

arXiv:2503.08239 [pdf, other]

EnergyFormer: Energy Attention with Fourier Embedding for Hyperspectral Image Classification

Authors: Saad Sohail, Muhammad Usama, Usman Ghous, Manuel Mazzara, Salvatore Distefano, Muhammad Ahmad

Abstract: Hyperspectral imaging (HSI) provides rich spectral-spatial information across hundreds of contiguous bands, enabling precise material discrimination in applications such as environmental monitoring, agriculture, and urban analysis. However, the high dimensionality and spectral variability of HSI data pose significant challenges for feature extraction and classification. This paper presents EnergyF… ▽ More Hyperspectral imaging (HSI) provides rich spectral-spatial information across hundreds of contiguous bands, enabling precise material discrimination in applications such as environmental monitoring, agriculture, and urban analysis. However, the high dimensionality and spectral variability of HSI data pose significant challenges for feature extraction and classification. This paper presents EnergyFormer, a transformer-based framework designed to address these challenges through three key innovations: (1) Multi-Head Energy Attention (MHEA), which optimizes an energy function to selectively enhance critical spectral-spatial features, improving feature discrimination; (2) Fourier Position Embedding (FoPE), which adaptively encodes spectral and spatial dependencies to reinforce long-range interactions; and (3) Enhanced Convolutional Block Attention Module (ECBAM), which selectively amplifies informative wavelength bands and spatial structures, enhancing representation learning. Extensive experiments on the WHU-Hi-HanChuan, Salinas, and Pavia University datasets demonstrate that EnergyFormer achieves exceptional overall accuracies of 99.28\%, 98.63\%, and 98.72\%, respectively, outperforming state-of-the-art CNN, transformer, and Mamba-based models. The source code will be made available at https://github.com/mahmad000. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2503.05687 [pdf, other]

Quantum gas microscopy of three-flavor Hubbard systems

Authors: Jirayu Mongkolkiattichai, Liyu Liu, Sohail Dasgupta, Kaden R. A. Hazzard, Peter Schauss

Abstract: Hubbard systems are paradigmatic realizations of strongly correlated many-body systems. Introducing additional species breaks the SU(2) symmetry of the Hubbard model and leads to a wide variety of novel exotic quantum phases. Three-component fermionic systems are at the heart of model systems for quantum chromodynamics where the three components reflect the three flavors. Here, we extend quantum g… ▽ More Hubbard systems are paradigmatic realizations of strongly correlated many-body systems. Introducing additional species breaks the SU(2) symmetry of the Hubbard model and leads to a wide variety of novel exotic quantum phases. Three-component fermionic systems are at the heart of model systems for quantum chromodynamics where the three components reflect the three flavors. Here, we extend quantum gas microscopy to three-flavor Fermi lattice gases in the Hubbard regime. Relying on site- and flavor-resolved detection, we study the phase diagram of the three-flavor Hubbard model and find signatures of flavor-selective localization and selective pairing at temperatures down to the tunneling energy scale. Our measurements are compared with numerical linked-cluster expansion calculations. Further increase of phase space density may enable the observation of a novel pair Mott phase at half filling, and shows a path towards the study of color superfluidity and other aspects of quantum chromodynamics. △ Less

Submitted 7 March, 2025; originally announced March 2025.

Comments: 15 pages, 17 figures

arXiv:2503.03432 [pdf, ps, other]

Light drag in an Optomechanical system

Authors: Hazrat Ali, Nadia Boutabba, Amjad Sohail

Abstract: Light dragging refers to the change in the path of light passing through a moving medium. This effect enables accurate detection of very slow speeds of light, which have prominent applications in state transfer, quantum gate operations, and quantum memory implementations. Here, to the best of our knowledge, we demonstrate the existence of the light-dragging effect in an optomechanical system (OMS)… ▽ More Light dragging refers to the change in the path of light passing through a moving medium. This effect enables accurate detection of very slow speeds of light, which have prominent applications in state transfer, quantum gate operations, and quantum memory implementations. Here, to the best of our knowledge, we demonstrate the existence of the light-dragging effect in an optomechanical system (OMS) for the first time. The origin of this key factor arises from the nonlinear effects linked to optomechanical-induced transparency (OMIT). Hence, we observe prominent effects in the group and refractive indices profile spectra related to optomechanical parameters such as the decay rate of the cavity field, the mirror's damping momentum rate, and mechanical frequency. We find out that lateral light drag depends on the detuning by altering the amplitude and direction of the translational velocity. This allowed us to change the light's propagation through the optomechanical cavity from superluminal to subluminal and vice versa by modifying the probe's detuning. The ability to manipulate and control the light drag through an optomechanical system might be useful in designing novel optical devices and systems with enhanced performance. △ Less

Submitted 5 March, 2025; originally announced March 2025.

Comments: 12 pages, 8 figures

arXiv:2503.00723 [pdf, other]

Re-Imagining Multimodal Instruction Tuning: A Representation View

Authors: Yiyang Liu, James Chenhao Liang, Ruixiang Tang, Yugyung Lee, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Lifu Huang, Dongfang Liu, Qifan Wang, Cheng Han

Abstract: Multimodal instruction tuning has proven to be an effective strategy for achieving zero-shot generalization by fine-tuning pre-trained Large Multimodal Models (LMMs) with instruction-following data. However, as the scale of LMMs continues to grow, fully fine-tuning these models has become highly parameter-intensive. Although Parameter-Efficient Fine-Tuning (PEFT) methods have been introduced to re… ▽ More Multimodal instruction tuning has proven to be an effective strategy for achieving zero-shot generalization by fine-tuning pre-trained Large Multimodal Models (LMMs) with instruction-following data. However, as the scale of LMMs continues to grow, fully fine-tuning these models has become highly parameter-intensive. Although Parameter-Efficient Fine-Tuning (PEFT) methods have been introduced to reduce the number of tunable parameters, a significant performance gap remains compared to full fine-tuning. Furthermore, existing PEFT approaches are often highly parameterized, making them difficult to interpret and control. In light of this, we introduce Multimodal Representation Tuning (MRT), a novel approach that focuses on directly editing semantically rich multimodal representations to achieve strong performance and provide intuitive control over LMMs. Empirical results show that our method surpasses current state-of-the-art baselines with significant performance gains (e.g., 1580.40 MME score) while requiring substantially fewer tunable parameters (e.g., 0.03% parameters). Additionally, we conduct experiments on editing instrumental tokens within multimodal representations, demonstrating that direct manipulation of these representations enables simple yet effective control over network behavior. △ Less

Submitted 20 March, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

arXiv:2502.16630 [pdf, other]

Probing a Quarkophobic ${\mathbf{W}}^\prime$ at the High-Luminosity LHC via Vector Boson Fusion and Lorentz-Equivariant Point Cloud Learning

Authors: U. S. Qureshi, A. Gurrola, J. D. Ruiz-Álvarez

Abstract: The addition of a heavy charged vector gauge boson ${\mathbf{W}}^\prime$ to the Standard Model (SM) with negligible quark couplings ("quarkophobic") and triple gauge couplings can address issues with the SM, such as the B-meson anomalies and recent discrepancies in the W boson mass measurements. We present a phenomenology study probing ${\mathbf{W}}^\prime$ production through weak boson fusion in… ▽ More The addition of a heavy charged vector gauge boson ${\mathbf{W}}^\prime$ to the Standard Model (SM) with negligible quark couplings ("quarkophobic") and triple gauge couplings can address issues with the SM, such as the B-meson anomalies and recent discrepancies in the W boson mass measurements. We present a phenomenology study probing ${\mathbf{W}}^\prime$ production through weak boson fusion in proton-proton collisions at the Large Hadron Collider. We operate under a simplified model with a large ${\mathbf{W}}^\prime$ decay width and consider final states with two jets, large missing transverse momentum, and one light lepton. Notably, we use point cloud learning for the first time in a BSM search$\unicode{x2014}$specifically, a novel Lorentz-Equivariant Geometric Algebra Transformer$\unicode{x2014}$providing significant improvement in signal sensitivity compared to traditional methods. △ Less

Submitted 23 February, 2025; originally announced February 2025.

arXiv:2502.14949 [pdf, ps, other]

KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding

Authors: Ahmed Heakl, Abdullah Sohail, Mukul Ranjan, Rania Hossam, Ghazi Shazan Ahmad, Mohamed El-Geish, Omar Maher, Zhiqiang Shen, Fahad Khan, Salman Khan

Abstract: With the growing adoption of Retrieval-Augmented Generation (RAG) in document processing, robust text recognition has become increasingly critical for knowledge extraction. While OCR (Optical Character Recognition) for English and other languages benefits from large datasets and well-established benchmarks, Arabic OCR faces unique challenges due to its cursive script, right-to-left text flow, and… ▽ More With the growing adoption of Retrieval-Augmented Generation (RAG) in document processing, robust text recognition has become increasingly critical for knowledge extraction. While OCR (Optical Character Recognition) for English and other languages benefits from large datasets and well-established benchmarks, Arabic OCR faces unique challenges due to its cursive script, right-to-left text flow, and complex typographic and calligraphic features. We present KITAB-Bench, a comprehensive Arabic OCR benchmark that fills the gaps in current evaluation systems. Our benchmark comprises 8,809 samples across 9 major domains and 36 sub-domains, encompassing diverse document types including handwritten text, structured tables, and specialized coverage of 21 chart types for business intelligence. Our findings show that modern vision-language models (such as GPT-4o, Gemini, and Qwen) outperform traditional OCR approaches (like EasyOCR, PaddleOCR, and Surya) by an average of 60% in Character Error Rate (CER). Furthermore, we highlight significant limitations of current Arabic OCR models, particularly in PDF-to-Markdown conversion, where the best model Gemini-2.0-Flash achieves only 65% accuracy. This underscores the challenges in accurately recognizing Arabic text, including issues with complex fonts, numeral recognition errors, word elongation, and table structure detection. This work establishes a rigorous evaluation framework that can drive improvements in Arabic document analysis methods and bridge the performance gap with English OCR technologies. △ Less

Submitted 27 June, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

Comments: 17 pages, 5 figures, ACL 2025

arXiv:2502.12252 [pdf, other]

Roadmap to fault tolerant quantum computation using topological qubit arrays

Authors: David Aasen, Morteza Aghaee, Zulfi Alam, Mariusz Andrzejczuk, Andrey Antipov, Mikhail Astafev, Lukas Avilovas, Amin Barzegar, Bela Bauer, Jonathan Becker, Juan M. Bello-Rivas, Umesh Bhaskar, Alex Bocharov, Srini Boddapati, David Bohn, Jouri Bommer, Parsa Bonderson, Jan Borovsky, Leo Bourdet, Samuel Boutin, Tom Brown, Gary Campbell, Lucas Casparis, Srivatsa Chakravarthi, Rui Chao , et al. (157 additional authors not shown)

Abstract: We describe a concrete device roadmap towards a fault-tolerant quantum computing architecture based on noise-resilient, topologically protected Majorana-based qubits. Our roadmap encompasses four generations of devices: a single-qubit device that enables a measurement-based qubit benchmarking protocol; a two-qubit device that uses measurement-based braiding to perform single-qubit Clifford operati… ▽ More We describe a concrete device roadmap towards a fault-tolerant quantum computing architecture based on noise-resilient, topologically protected Majorana-based qubits. Our roadmap encompasses four generations of devices: a single-qubit device that enables a measurement-based qubit benchmarking protocol; a two-qubit device that uses measurement-based braiding to perform single-qubit Clifford operations; an eight-qubit device that can be used to show an improvement of a two-qubit operation when performed on logical qubits rather than directly on physical qubits; and a topological qubit array supporting lattice surgery demonstrations on two logical qubits. Devices that enable this path require a superconductor-semiconductor heterostructure that supports a topological phase, quantum dots and coupling between those quantum dots that can create the appropriate loops for interferometric measurements, and a microwave readout system that can perform fast, low-error single-shot measurements. We describe the key design components of these qubit devices, along with the associated protocols for demonstrations of single-qubit benchmarking, Clifford gate execution, quantum error detection, and quantum error correction, which differ greatly from those in more conventional qubits. Finally, we comment on implications and advantages of this architecture for utility-scale quantum computation. △ Less

Submitted 7 April, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

Comments: v2: 12+8 pages, 9+5 figures, significant main text revisions, added appendices discussing idle coherence times and non-Clifford operations v1:11+6 pages, 8+5 figures

arXiv:2502.12129 [pdf, other]

When Wyner and Ziv Met Bayes in Quantum-Classical Realm

Authors: Mohammad Aamir Sohail, Touheed Anwar Atif, S. Sandeep Pradhan

Abstract: In this work, we address the lossy quantum-classical source coding with the quantum side-information (QC-QSI) problem. The task is to compress the classical information about a quantum source, obtained after performing a measurement while incurring a bounded reconstruction error. Here, the decoder is allowed to use the side information to recover the classical data obtained from measurements on th… ▽ More In this work, we address the lossy quantum-classical source coding with the quantum side-information (QC-QSI) problem. The task is to compress the classical information about a quantum source, obtained after performing a measurement while incurring a bounded reconstruction error. Here, the decoder is allowed to use the side information to recover the classical data obtained from measurements on the source states. We introduce a new formulation based on a backward (posterior) channel, replacing the single-letter distortion observable with a single-letter posterior channel to capture reconstruction error. Unlike the rate-distortion framework, this formulation imposes a block error constraint. An analogous formulation is developed for lossy classical source coding with classical side information (C-CSI) problem. We derive an inner bound on the asymptotic performance limit in terms of single-letter quantum and classical mutual information quantities of the given posterior channel for QC-QSI and C-CSI cases, respectively. Furthermore, we establish a connection between rate-distortion and rate-channel theory, showing that a rate-channel compression protocol attains the optimal rate-distortion function for a specific distortion measure and level. △ Less

Submitted 17 February, 2025; originally announced February 2025.

arXiv:2502.06040 [pdf, ps, other]

Perfect Transfer of Entanglement and One-Way Quantum Steering via Parametric Frequency Converter in a Two-mode Cavity Magnomechanical System

Authors: Amjad Sohail, Allah Nawaz, Hazrat Ali, Rizwan Ahmed, Marcos Cesar de Oliveira

Abstract: We study the effects of a parametric frequency converter in a two-mode cavity system where one of the cavity mode is coupled with yttrium iron garnet (YIG) via magnetic dipole interaction. Parametric frequency converter acts as a nonlinear source for enhanced entanglement among all bipartitions and asymmetrical quantum steering. The behavior of the two types of quantum correlations are shown to be… ▽ More We study the effects of a parametric frequency converter in a two-mode cavity system where one of the cavity mode is coupled with yttrium iron garnet (YIG) via magnetic dipole interaction. Parametric frequency converter acts as a nonlinear source for enhanced entanglement among all bipartitions and asymmetrical quantum steering. The behavior of the two types of quantum correlations are shown to be dependent on parametric coupling and the associated phase factor. We show that cavity-cavity entanglement and cavity-phonon entanglement (cavity-magnon entanglement) decreases (increases) with the increase of the parametric phase factor φ. In addition, generated entanglements in the present system have shown to be more robust against the thermal effects, with the inclusion of the parametric converter as compared with the bare cavity case. Another intriguing finding is the asymmetric one-way steering, where we notice that magnon and phonon modes can steer the indirectly coupled cavity modes, yet the steering in swapped direction is not observed. It is of great interest that the perfect transfer of entanglement and quantum steering is achieved among different modes by adjusting the system's parameters. In fact, our protocol for these transferring processes suggests a different approach to the processing and storage of quantum information. △ Less

Submitted 9 February, 2025; originally announced February 2025.

Comments: 12 pages, 7 figures

arXiv:2502.05272 [pdf, ps, other]

Phase-Sensitive Enhanced Absorption, Transmission and Slow Light in a Cross-cavity Magnomechanical System

Authors: Amjad Sohail, Hazrat Ali, K. B. Emale, Mohamed Amazioug, Rizwan Ahmed

Abstract: We theoretically propose a scheme to explore the magnetically and magnomechanically induced transparency phenomena in a cross-cavity magnomechanical system, focusing on the role of relative phase and the intensity of the two probing fields in enhancing the absorption and transmission spectra and manipulating the group delay of the transmitted light. Interestingly, the relative phase of the two pro… ▽ More We theoretically propose a scheme to explore the magnetically and magnomechanically induced transparency phenomena in a cross-cavity magnomechanical system, focusing on the role of relative phase and the intensity of the two probing fields in enhancing the absorption and transmission spectra and manipulating the group delay of the transmitted light. Interestingly, the relative phase of the two probe fields could have overwhelming effects on both the absorption spectrum and the group delay of the output field. Tuning the relative phase and amplitude of the probe fields can suppress or enhance the absorption and transmission spectra. The combined effect of the magnon-photon and magnon-phonon couplings, along with relative phase modulations, helps to switch the probe field's behavior from subluminal to superluminal in the current system. The current study offers a straightforward and practical approach, demonstrating the capability to employ the relative phase for the modulation of microwave signals within the cavity magnomechanical system, providing insights for the design of information transduction and quantum sensing. △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: 9 pages, 9 figures

arXiv:2501.14045 [pdf, other]

doi 10.1016/j.physb.2025.417313

Nonreciprocal entanglement in a molecular optomechanical system

Authors: E. Kongkui Berinyuy, Jia-Xin Peng, A. Sohail, P. Djorwe, A. -H. Abdel-Aty, N. Alessa, K. S. Nisar, S. G. Nana Engo

Abstract: We propose a theoretical scheme to generate nonreciprocal bipartite entanglement between a cavity mode and vibrational modes in a molecular cavity optomechanical system. Our system consists of $\mathcal{N}$ molecules placed inside a spinning whispering-gallery-mode (WGM) resonator. The vibrational modes of these molecules are coupled to the WGM resonator mode (which is analogous to a plasmonic cav… ▽ More We propose a theoretical scheme to generate nonreciprocal bipartite entanglement between a cavity mode and vibrational modes in a molecular cavity optomechanical system. Our system consists of $\mathcal{N}$ molecules placed inside a spinning whispering-gallery-mode (WGM) resonator. The vibrational modes of these molecules are coupled to the WGM resonator mode (which is analogous to a plasmonic cavity) and the resonator is also coupled to an auxiliary optical cavity. We demonstrate that nonreciprocal photon-vibration entanglement and nonreciprocal vibration-vibration entanglement can be generated in this system, even at high temperatures. These nonreciprocal entanglements arise due to the Sagnac-Fizeau effect induced by the spinning WGM resonator. We find that spinning the WGM resonator in the counter-clockwise (CCW) direction enhances both types of nonreciprocal entanglement, especially under blue-detuned driving of the optical cavity mode. Furthermore, we show that vibration-vibration entanglement can be significantly enhanced by increasing the number of molecules. Our findings have potential applications in quantum information transmission and in the development of nonreciprocal quantum devices. △ Less

Submitted 25 May, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.12433 [pdf]

Owls are wise and foxes are unfaithful: Uncovering animal stereotypes in vision-language models

Authors: Tabinda Aman, Mohammad Nadeem, Shahab Saquib Sohail, Mohammad Anas, Erik Cambria

Abstract: Animal stereotypes are deeply embedded in human culture and language. They often shape our perceptions and expectations of various species. Our study investigates how animal stereotypes manifest in vision-language models during the task of image generation. Through targeted prompts, we explore whether DALL-E perpetuates stereotypical representations of animals, such as "owls as wise," "foxes as un… ▽ More Animal stereotypes are deeply embedded in human culture and language. They often shape our perceptions and expectations of various species. Our study investigates how animal stereotypes manifest in vision-language models during the task of image generation. Through targeted prompts, we explore whether DALL-E perpetuates stereotypical representations of animals, such as "owls as wise," "foxes as unfaithful," etc. Our findings reveal significant stereotyped instances where the model consistently generates images aligned with cultural biases. The current work is the first of its kind to examine animal stereotyping in vision-language models systematically and to highlight a critical yet underexplored dimension of bias in AI-generated visual content. △ Less

Submitted 29 April, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

arXiv:2501.01987 [pdf]

Gender Bias in Text-to-Video Generation Models: A case study of Sora

Authors: Mohammad Nadeem, Shahab Saquib Sohail, Erik Cambria, Björn W. Schuller, Amir Hussain

Abstract: The advent of text-to-video generation models has revolutionized content creation as it produces high-quality videos from textual prompts. However, concerns regarding inherent biases in such models have prompted scrutiny, particularly regarding gender representation. Our study investigates the presence of gender bias in OpenAI's Sora, a state-of-the-art text-to-video generation model. We uncover s… ▽ More The advent of text-to-video generation models has revolutionized content creation as it produces high-quality videos from textual prompts. However, concerns regarding inherent biases in such models have prompted scrutiny, particularly regarding gender representation. Our study investigates the presence of gender bias in OpenAI's Sora, a state-of-the-art text-to-video generation model. We uncover significant evidence of bias by analyzing the generated videos from a diverse set of gender-neutral and stereotypical prompts. The results indicate that Sora disproportionately associates specific genders with stereotypical behaviors and professions, which reflects societal prejudices embedded in its training data. △ Less

Submitted 10 January, 2025; v1 submitted 30 December, 2024; originally announced January 2025.

Comments: 7 pages, 3 figures

arXiv:2501.01737 [pdf, other]

doi 10.1109/ACCESS.2024.3502918

DSLR-CNN: Efficient CNN Acceleration using Digit-Serial Left-to-Right Arithmetic

Authors: Malik Zohaib Nisar, Muhammad Sohail Ibrahim, Saeid Gorgin, Muhammad Usman, Jeong-A Lee

Abstract: Digit-serial arithmetic has emerged as a viable approach for designing hardware accelerators, reducing interconnections, area utilization, and power consumption. However, conventional methods suffer from performance and latency issues. To address these challenges, we propose an accelerator design using left-to-right (LR) arithmetic, which performs computations in a most-significant digit first (MS… ▽ More Digit-serial arithmetic has emerged as a viable approach for designing hardware accelerators, reducing interconnections, area utilization, and power consumption. However, conventional methods suffer from performance and latency issues. To address these challenges, we propose an accelerator design using left-to-right (LR) arithmetic, which performs computations in a most-significant digit first (MSDF) manner, enabling digit-level pipelining. This leads to substantial performance improvements and reduced latency. The processing engine is designed for convolutional neural networks (CNNs), which includes low-latency LR multipliers and adders for digit-level parallelism. The proposed DSLR-CNN is implemented in Verilog and synthesized with Synopsys design compiler using GSCL 45nm technology, the DSLR-CNN accelerator was evaluated on AlexNet, VGG-16, and ResNet-18 networks. Results show significant improvements across key performance evaluation metrics, including response time, peak performance, power consumption, operational intensity, area efficiency, and energy efficiency. The peak performance measured in GOPS of the proposed design is 4.37x to 569.11x higher than contemporary designs, and it achieved 3.58x to 44.75x higher peak energy efficiency (TOPS/W), outperforming conventional bit-serial designs. △ Less

Submitted 3 January, 2025; originally announced January 2025.

Comments: Published in IEEE Access Volume 12, 2024

Journal ref: IEEE Access (2024)

arXiv:2412.17844 [pdf]

doi 10.1109/RoboSoft48309.2020.9116035

Low-cost foil/paper based touch mode pressure sensing element as artificial skin module for prosthetic hand

Authors: Rishabh B. Mishra, Sherjeel M. Khan, Sohail F. Shaikh, Aftab M. Hussain, Muhammad M. Hussain

Abstract: Capacitive pressure sensors have several advantages in areas such as robotics, automation, aerospace, biomedical and consumer electronics. We present mathematical modelling, finite element analysis (FEA), fabrication and experimental characterization of ultra-low cost and paper-based, touch-mode, flexible capacitive pressure sensor element using Do-It-Yourself (DIY) technology. The pressure sensin… ▽ More Capacitive pressure sensors have several advantages in areas such as robotics, automation, aerospace, biomedical and consumer electronics. We present mathematical modelling, finite element analysis (FEA), fabrication and experimental characterization of ultra-low cost and paper-based, touch-mode, flexible capacitive pressure sensor element using Do-It-Yourself (DIY) technology. The pressure sensing element is utilized to design large-area electronics skin for low-cost prosthetic hands. The presented sensor is characterized in normal, transition, touch and saturation modes. The sensor has higher sensitivity and linearity in touch mode operation from 10 to 40 kPa of applied pressure compared to the normal (0 to 8 kPa), transition (8 to 10 kPa) and saturation mode (after 40 kPa) with response time of 15.85 ms. Advantages of the presented sensor are higher sensitivity, linear response, less diaphragm area, less von Mises stress at the clamped edges region, low temperature drift, robust structure and less separation gap for large pressure measurement compared to normal mode capacitive pressure sensors. The linear range of pressure change is utilized for controlling the position of a servo motor for precise movement in robotic arm using wireless communication, which can be utilized for designing skin-like structure for low-cost prosthetic hands. △ Less

Submitted 18 December, 2024; originally announced December 2024.

arXiv:2412.16119 [pdf, other]

Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts

Authors: Muhammad Abdullah Sohail, Salaar Masood, Hamza Iqbal

Abstract: This study investigates the potential of Large Language Models (LLMs), particularly GPT-4o, for Optical Character Recognition (OCR) in low-resource scripts such as Urdu, Albanian, and Tajik, with English serving as a benchmark. Using a meticulously curated dataset of 2,520 images incorporating controlled variations in text length, font size, background color, and blur, the research simulates diver… ▽ More This study investigates the potential of Large Language Models (LLMs), particularly GPT-4o, for Optical Character Recognition (OCR) in low-resource scripts such as Urdu, Albanian, and Tajik, with English serving as a benchmark. Using a meticulously curated dataset of 2,520 images incorporating controlled variations in text length, font size, background color, and blur, the research simulates diverse real-world challenges. Results emphasize the limitations of zero-shot LLM-based OCR, particularly for linguistically complex scripts, highlighting the need for annotated datasets and fine-tuned models. This work underscores the urgency of addressing accessibility gaps in text digitization, paving the way for inclusive and robust OCR solutions for underserved languages. △ Less

Submitted 20 December, 2024; originally announced December 2024.

arXiv:2412.15190 [pdf, other]

EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues

Authors: Sagar Soni, Akshay Dudhane, Hiyam Debary, Mustansar Fiaz, Muhammad Akhtar Munir, Muhammad Sohail Danish, Paolo Fraccaro, Campbell D Watson, Levente J Klein, Fahad Shahbaz Khan, Salman Khan

Abstract: Automated analysis of vast Earth observation data via interactive Vision-Language Models (VLMs) can unlock new opportunities for environmental monitoring, disaster response, and {resource management}. Existing generic VLMs do not perform well on Remote Sensing data, while the recent Geo-spatial VLMs remain restricted to a fixed resolution and few sensor modalities. In this paper, we introduce Eart… ▽ More Automated analysis of vast Earth observation data via interactive Vision-Language Models (VLMs) can unlock new opportunities for environmental monitoring, disaster response, and {resource management}. Existing generic VLMs do not perform well on Remote Sensing data, while the recent Geo-spatial VLMs remain restricted to a fixed resolution and few sensor modalities. In this paper, we introduce EarthDial, a conversational assistant specifically designed for Earth Observation (EO) data, transforming complex, multi-sensory Earth observations into interactive, natural language dialogues. EarthDial supports multi-spectral, multi-temporal, and multi-resolution imagery, enabling a wide range of remote sensing tasks, including classification, detection, captioning, question answering, visual reasoning, and visual grounding. To achieve this, we introduce an extensive instruction tuning dataset comprising over 11.11M instruction pairs covering RGB, Synthetic Aperture Radar (SAR), and multispectral modalities such as Near-Infrared (NIR) and infrared. Furthermore, EarthDial handles bi-temporal and multi-temporal sequence analysis for applications like change detection. Our extensive experimental results on 44 downstream datasets demonstrate that EarthDial outperforms existing generic and domain-specific models, achieving better generalization across various EO tasks. Our source codes and pre-trained models are at https://github.com/hiyamdebary/EarthDial. △ Less

Submitted 7 April, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

arXiv:2412.14917 [pdf, ps, other]

Undecidability in the Ramsey theory of polynomial equations and Hilbert's tenth problem

Authors: Sohail Farhangi, Steve Jackson, Bill Mance

Abstract: We show that several sets of interest arising from the study of partition regularity and density Ramsey theory of polynomial equations over integral domains are undecidable. In particular, we show that the set of homogeneous polynomials $p \in \mathbb{Z}[x_1,\cdots,x_n]$ for which the equation $p(x_1,\cdots,x_n) = 0$ is partition regular over $\mathbb{Z}\setminus\{0\}$ is undecidable conditional o… ▽ More We show that several sets of interest arising from the study of partition regularity and density Ramsey theory of polynomial equations over integral domains are undecidable. In particular, we show that the set of homogeneous polynomials $p \in \mathbb{Z}[x_1,\cdots,x_n]$ for which the equation $p(x_1,\cdots,x_n) = 0$ is partition regular over $\mathbb{Z}\setminus\{0\}$ is undecidable conditional on Hilbert's tenth problem for $\mathbb{Q}$. For other integral domains, we get the analogous result unconditionally. More generally, we determine the exact lightface complexity of the various sets of interest. For example, we show that the set of homogeneous polynomials $p \in \mathbb{F}_q(t)[x_1,\cdots,x_n]$ for which the equation $p(x_1,\cdots,x_n) = 0$ is partition regular over $\mathbb{F}_q(t)\setminus\{0\}$ is $Π_2^0$-complete. We also prove several other results of independent interest. These include a compactness principle and a uniformity principle for density Ramsey theory on countable cancellative left amenable semigroups, as well as the existence of the natural extension for measure preserving systems of countable cancellative left reversible semigroups. △ Less

Submitted 9 May, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

Comments: 44 pages

arXiv:2412.13724 [pdf, other]

USEFUSE: Uniform Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks

Authors: Muhammad Sohail Ibrahim, Muhammad Usman, Jeong-A Lee

Abstract: Convolutional Neural Networks (CNNs) are crucial in various applications, but their deployment on resource-constrained edge devices poses challenges. This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic to minimize response time and enhance overall performance. The study proposes a methodology for fusing multiple convolu… ▽ More Convolutional Neural Networks (CNNs) are crucial in various applications, but their deployment on resource-constrained edge devices poses challenges. This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic to minimize response time and enhance overall performance. The study proposes a methodology for fusing multiple convolution layers to reduce off-chip memory communication and increase overall performance. An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption without compromising accuracy. Furthermore, efficient tile movement guarantees uniform access to the fusion pyramid. An analysis demonstrates the utile stride strategy improves operational intensity. Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency. This approach notably reduced redundant computations, improving the efficiency of CNN deployment on edge devices. △ Less

Submitted 13 May, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

Comments: Accepted for publication in the Journal of Systems Architecture on 11 May, 2025

arXiv:2411.19389 [pdf, other]

Model-Agnostic Tagging of Quenched Jets in Heavy-Ion Collisions

Authors: Umar Sohail Qureshi, Raghav Kunnawalkam Elayavalli

Abstract: Measurements of jet substructure in ultra-relativistic heavy-ion collisions indicate that interactions with the quark-gluon plasma quench the jet showering process. Modern data-driven methods have shown promise in probing these modifications in the jet's hard substructure. In this Letter, we present a machine learning framework to identify quenched jets while accounting for pileup, uncorrelated so… ▽ More Measurements of jet substructure in ultra-relativistic heavy-ion collisions indicate that interactions with the quark-gluon plasma quench the jet showering process. Modern data-driven methods have shown promise in probing these modifications in the jet's hard substructure. In this Letter, we present a machine learning framework to identify quenched jets while accounting for pileup, uncorrelated soft particle background, and detector effects; a more experimentally realistic and challenging scenario than previously addressed. Our approach leverages an interpretable sequential attention-based mechanism that integrates representations of individual jet constituents alongside global jet observables as features. The framework sets a new benchmark for tagging quenched jets with reduced model dependence. △ Less

Submitted 28 November, 2024; originally announced November 2024.

arXiv:2411.19325 [pdf, other]

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Authors: Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Kartik Kuckreja, Fahad Shahbaz Khan, Paolo Fraccaro, Alexandre Lacoste, Salman Khan

Abstract: While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they do not effectively address the specific challenges of geospatial applications. Generic VLM benchmarks are not designed to handle the complexities of geospatial data, an essential component for applications such as environmental monitoring, urban planning, and disaster management. Key challenges in the… ▽ More While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they do not effectively address the specific challenges of geospatial applications. Generic VLM benchmarks are not designed to handle the complexities of geospatial data, an essential component for applications such as environmental monitoring, urban planning, and disaster management. Key challenges in the geospatial domain include temporal change detection, large-scale object counting, tiny object detection, and understanding relationships between entities in remote sensing imagery. To bridge this gap, we present GEOBench-VLM, a comprehensive benchmark specifically designed to evaluate VLMs on geospatial tasks, including scene understanding, object counting, localization, fine-grained categorization, segmentation, and temporal analysis. Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales. We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges. The results indicate that although existing VLMs demonstrate potential, they face challenges when dealing with geospatial-specific tasks, highlighting the room for further improvements. Notably, the best-performing LLaVa-OneVision achieves only 41.7% accuracy on MCQs, slightly more than GPT-4o, which is approximately double the random guess performance. Our benchmark is publicly available at https://github.com/The-AI-Alliance/GEO-Bench-VLM . △ Less

Submitted 12 March, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

Comments: This updated version includes revisions and additional analysis

arXiv:2411.16897 [pdf, other]

A New Herwig7 Underlying Event Tune: from RHIC to LHC Energies

Authors: Umar Sohail Qureshi, Raghav Kunnawalkam Elayavalli, Luke Mozarsky, Helen Caines, Isaac Mooney

Abstract: We present parameter sets corresponding to new underlying event tunes for the Herwig7.3 Monte Carlo event generator. The existing Herwig tunes are in good agreement with LHC data, however, they are not typically designed for center-of-mass energies below $\sqrt{s}=300$ GeV. The tunes presented in this study can describe mid-rapidity data collected at the nominal RHIC energy of $\sqrt{s }=200$ GeV,… ▽ More We present parameter sets corresponding to new underlying event tunes for the Herwig7.3 Monte Carlo event generator. The existing Herwig tunes are in good agreement with LHC data, however, they are not typically designed for center-of-mass energies below $\sqrt{s}=300$ GeV. The tunes presented in this study can describe mid-rapidity data collected at the nominal RHIC energy of $\sqrt{s }=200$ GeV, as well as higher center-of-mass energies utilized by experiments elsewhere, such as the LHC. The base "New Haven" tune is developed by fitting minimum-bias simulations of proton-proton collisions to mid-rapidity identified hadron and jet data from the STAR experiment. The "Nashville" tune includes a separate set of parameters developed by tuning to Tevatron proton-antiproton data at $\sqrt{s}=300$, $900$ and $1960$ GeV from CDF, and LHC proton-proton measurements from CMS at $\sqrt{s}=7$ TeV, in addition to the STAR measurements. Both new tunes demonstrate significant improvements over the recommended default tune currently included in the latest version of Herwig for minimum bias production. As such, we advocate using these tunes for future simulation studies at mid-rapidity by the experimental collaborations at RHIC (STAR and sPHENIX) and the LHC (ATLAS, ALICE, CMS). △ Less

Submitted 26 November, 2024; v1 submitted 25 November, 2024; originally announced November 2024.

arXiv:2411.13837 [pdf, other]

Probing Compressed Mass Spectrum Supersymmetry at the LHC with the Vector Boson Fusion Topology

Authors: Umar Sohail Qureshi, Alfredo Gurrola, Andres Flórez

Abstract: We present a phenomenology study probing pair production of supersymmetric charginos and neutralinos ("electroweakinos") with the vector boson fusion (VBF) topology in proton-proton collisions at CERN's Large Hadron Collider (LHC). In particular, we examine the compressed-mass spectrum phase space that has been traditionally challenging due to experimental constraints. The final states considered… ▽ More We present a phenomenology study probing pair production of supersymmetric charginos and neutralinos ("electroweakinos") with the vector boson fusion (VBF) topology in proton-proton collisions at CERN's Large Hadron Collider (LHC). In particular, we examine the compressed-mass spectrum phase space that has been traditionally challenging due to experimental constraints. The final states considered have two jets, large missing transverse momentum, and one, two, or three light leptons. Different model scenarios are considered for the production and decays of the electroweakinos. A novel high-performance and interpretable sequential attention-based machine learning algorithm is employed for signal-background discrimination and is observed to significantly improve signal sensitivity over traditional methods. We report expected signal significances for integrated luminosities of $137$, $300$, and $3000$ $\textrm{fb}^{-1}$ corresponding to the current data acquired at the LHC, expectation for the end of Run 3, and the expectation for the high-luminosity LHC. Our methodology results in projected 95\% confidence level bounds that cover chargino masses up to 1.1 TeV in compressed-mass spectrum scenarios within the R-parity conserving minimal supersymmetric standard model. This parameter space, currently beyond the reach of ATLAS and CMS searches at the LHC, is traditionally challenging to explore due to significant Standard Model backgrounds and low signal cross-sections. △ Less

Submitted 20 November, 2024; originally announced November 2024.

arXiv:2411.05853 [pdf, ps, other]

A Fundamental Accuracy--Robustness Trade-off in Regression and Classification

Authors: Sohail Bahmani

Abstract: We derive a fundamental trade-off between standard and adversarial risk in a rather general situation that formalizes the following simple intuition: "If no (nearly) optimal predictor is smooth, adversarial robustness comes at the cost of accuracy." As a concrete example, we evaluate the derived trade-off in regression with polynomial ridge functions under mild regularity conditions. Generalizing… ▽ More We derive a fundamental trade-off between standard and adversarial risk in a rather general situation that formalizes the following simple intuition: "If no (nearly) optimal predictor is smooth, adversarial robustness comes at the cost of accuracy." As a concrete example, we evaluate the derived trade-off in regression with polynomial ridge functions under mild regularity conditions. Generalizing our analysis of this example, we formulate a necessary condition under which adversarial robustness can be achieved without significant degradation of the accuracy. This necessary condition is expressed in terms of a quantity that resembles the Poincaré constant of the data distribution. △ Less

Submitted 28 June, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

arXiv:2411.05296 [pdf, other]

On Training of Kolmogorov-Arnold Networks

Authors: Shairoz Sohail

Abstract: Kolmogorov-Arnold Networks have recently been introduced as a flexible alternative to multi-layer Perceptron architectures. In this paper, we examine the training dynamics of different KAN architectures and compare them with corresponding MLP formulations. We train with a variety of different initialization schemes, optimizers, and learning rates, as well as utilize back propagation free approache… ▽ More Kolmogorov-Arnold Networks have recently been introduced as a flexible alternative to multi-layer Perceptron architectures. In this paper, we examine the training dynamics of different KAN architectures and compare them with corresponding MLP formulations. We train with a variety of different initialization schemes, optimizers, and learning rates, as well as utilize back propagation free approaches like the HSIC Bottleneck. We find that (when judged by test accuracy) KANs are an effective alternative to MLP architectures on high-dimensional datasets and have somewhat better parameter efficiency, but suffer from more unstable training dynamics. Finally, we provide recommendations for improving training stability of larger KAN models. △ Less

Submitted 7 November, 2024; originally announced November 2024.

Comments: 7 pages, 6 figures

ACM Class: I.2.4

arXiv:2410.22361 [pdf, other]

Lecture Notes on Grid Modeling of Renewable Energy

Authors: Sohail Khan

Abstract: These lecture notes provide a comprehensive guide on Grid Modeling of Renewable Energy, offering a foundational overview of power system network modeling, power flow, and load flow algorithms critical for electrical and renewable energy engineering. Key topics include steady-state, dynamic, and frequency domain models, with a particular focus on renewable energy integration, simulation techniques,… ▽ More These lecture notes provide a comprehensive guide on Grid Modeling of Renewable Energy, offering a foundational overview of power system network modeling, power flow, and load flow algorithms critical for electrical and renewable energy engineering. Key topics include steady-state, dynamic, and frequency domain models, with a particular focus on renewable energy integration, simulation techniques, and their effects on grid stability and power quality. Practical examples using Matpower and Pandapower tools are included to reinforce concepts, ensuring that students gain hands-on experience in modeling and analyzing modern energy systems under variable conditions. △ Less

Submitted 26 October, 2024; originally announced October 2024.

Comments: Lecture Notes

ACM Class: H.1

Showing 1–50 of 254 results for author: Sohail