Search | arXiv e-print repository

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation

Authors: Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Muhammad Haris Khan, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan, Salman Khan

Abstract: Modern Earth observation (EO) increasingly leverages deep learning to harness the scale and diversity of satellite imagery across sensors and regions. While recent foundation models have demonstrated promising generalization across EO tasks, many remain limited by the scale, geographical coverage, and spectral diversity of their training data, factors critical for learning globally transferable re… ▽ More Modern Earth observation (EO) increasingly leverages deep learning to harness the scale and diversity of satellite imagery across sensors and regions. While recent foundation models have demonstrated promising generalization across EO tasks, many remain limited by the scale, geographical coverage, and spectral diversity of their training data, factors critical for learning globally transferable representations. In this work, we introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery, combined with large spatial tiles and land-cover aware sampling to enrich spatial and semantic coverage. By treating sensing modalities as natural augmentations in our self-supervised approach, we unify radar and optical inputs via modality-specific patch embeddings and adaptive cross-attention fusion. Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism that incorporates class-frequency-aware regularization to address long-tailed distributions in land cover.TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench. Our code and pretrained models are publicly available at: https://github.com/mbzuai-oryx/TerraFM . △ Less

Submitted 6 June, 2025; originally announced June 2025.

arXiv:2506.00831 [pdf]

A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems

Authors: M Sabbir Salek, Mashrur Chowdhury, Muhaimin Bin Munir, Yuchen Cai, Mohammad Imtiaz Hasan, Jean-Michel Tine, Latifur Khan, Mizanur Rahman

Abstract: Modern transportation systems rely on cyber-physical systems (CPS), where cyber systems interact seamlessly with physical systems like transportation-related sensors and actuators to enhance safety, mobility, and energy efficiency. However, growing automation and connectivity increase exposure to cyber vulnerabilities. Existing threat modeling frameworks for transportation CPS are often limited in… ▽ More Modern transportation systems rely on cyber-physical systems (CPS), where cyber systems interact seamlessly with physical systems like transportation-related sensors and actuators to enhance safety, mobility, and energy efficiency. However, growing automation and connectivity increase exposure to cyber vulnerabilities. Existing threat modeling frameworks for transportation CPS are often limited in scope, resource-intensive, and dependent on significant cybersecurity expertise. To address these gaps, we present TraCR-TMF (Transportation Cybersecurity and Resiliency Threat Modeling Framework), a large language model (LLM)-based framework that minimizes expert intervention. TraCR-TMF identifies threats, potential attack techniques, and corresponding countermeasures by leveraging the MITRE ATT&CK matrix through three LLM-based approaches: (i) a retrieval-augmented generation (RAG) method requiring no expert input, (ii) an in-context learning approach requiring low expert input, and (iii) a supervised fine-tuning method requiring moderate expert input. TraCR-TMF also maps attack paths to critical assets by analyzing vulnerabilities using a customized LLM. The framework was evaluated in two scenarios. First, it identified relevant attack techniques across transportation CPS applications, with 90% precision as validated by experts. Second, using a fine-tuned LLM, it successfully predicted multiple exploitations including lateral movement, data exfiltration, and ransomware-related encryption that occurred during a major real-world cyberattack incident. These results demonstrate TraCR-TMF's effectiveness in CPS threat modeling, its reduced reliance on cybersecurity expertise, and its adaptability across CPS domains. △ Less

Submitted 1 June, 2025; originally announced June 2025.

arXiv:2505.23752 [pdf, ps, other]

ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks

Authors: Akashah Shabbir, Muhammad Akhtar Munir, Akshay Dudhane, Muhammad Umer Sheikh, Muhammad Haris Khan, Paolo Fraccaro, Juan Bernabe Moreno, Fahad Shahbaz Khan, Salman Khan

Abstract: Recent progress in large language models (LLMs) has enabled tool-augmented agents capable of solving complex real-world tasks through step-by-step reasoning. However, existing evaluations often focus on general-purpose or multimodal scenarios, leaving a gap in domain-specific benchmarks that assess tool-use capabilities in complex remote sensing use cases. We present ThinkGeo, an agentic benchmark… ▽ More Recent progress in large language models (LLMs) has enabled tool-augmented agents capable of solving complex real-world tasks through step-by-step reasoning. However, existing evaluations often focus on general-purpose or multimodal scenarios, leaving a gap in domain-specific benchmarks that assess tool-use capabilities in complex remote sensing use cases. We present ThinkGeo, an agentic benchmark designed to evaluate LLM-driven agents on remote sensing tasks via structured tool use and multi-step planning. Inspired by tool-interaction paradigms, ThinkGeo includes human-curated queries spanning a wide range of real-world applications such as urban planning, disaster assessment and change analysis, environmental monitoring, transportation analysis, aviation monitoring, recreational infrastructure, and industrial site analysis. Each query is grounded in satellite or aerial imagery and requires agents to reason through a diverse toolset. We implement a ReAct-style interaction loop and evaluate both open and closed-source LLMs (e.g., GPT-4o, Qwen2.5) on 436 structured agentic tasks. The benchmark reports both step-wise execution metrics and final answer correctness. Our analysis reveals notable disparities in tool accuracy and planning consistency across models. ThinkGeo provides the first extensive testbed for evaluating how tool-enabled LLMs handle spatial reasoning in remote sensing. Our code and dataset are publicly available △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2503.12096 [pdf, other]

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models

Authors: Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah, Salman Khan, Muhammad Haris Khan

Abstract: Test-time prompt tuning for vision-language models (VLMs) is getting attention because of their ability to learn with unlabeled data without fine-tuning. Although test-time prompt tuning methods for VLMs can boost accuracy, the resulting models tend to demonstrate poor calibration, which casts doubts on the reliability and trustworthiness of these models. Notably, more attention needs to be devote… ▽ More Test-time prompt tuning for vision-language models (VLMs) is getting attention because of their ability to learn with unlabeled data without fine-tuning. Although test-time prompt tuning methods for VLMs can boost accuracy, the resulting models tend to demonstrate poor calibration, which casts doubts on the reliability and trustworthiness of these models. Notably, more attention needs to be devoted to calibrating the test-time prompt tuning in vision-language models. To this end, we propose a new approach, called O-TPT that introduces orthogonality constraints on the textual features corresponding to the learnable prompts for calibrating test-time prompt tuning in VLMs. Towards introducing orthogonality constraints, we make the following contributions. First, we uncover new insights behind the suboptimal calibration performance of existing methods relying on textual feature dispersion. Second, we show that imposing a simple orthogonalization of textual features is a more effective approach towards obtaining textual dispersion. We conduct extensive experiments on various datasets with different backbones and baselines. The results indicate that our method consistently outperforms the prior state of the art in significantly reducing the overall average calibration error. Also, our method surpasses the zero-shot calibration performance on fine-grained classification tasks. △ Less

Submitted 15 March, 2025; originally announced March 2025.

Comments: Accepted at CVPR 2025

arXiv:2502.17919 [pdf, other]

AirCast: Improving Air Pollution Forecasting Through Multi-Variable Data Alignment

Authors: Vishal Nedungadi, Muhammad Akhtar Munir, Marc Rußwurm, Ron Sarafian, Ioannis N. Athanasiadis, Yinon Rudich, Fahad Shahbaz Khan, Salman Khan

Abstract: Air pollution remains a leading global health risk, exacerbated by rapid industrialization and urbanization, contributing significantly to morbidity and mortality rates. In this paper, we introduce AirCast, a novel multi-variable air pollution forecasting model, by combining weather and air quality variables. AirCast employs a multi-task head architecture that simultaneously forecasts atmospheric… ▽ More Air pollution remains a leading global health risk, exacerbated by rapid industrialization and urbanization, contributing significantly to morbidity and mortality rates. In this paper, we introduce AirCast, a novel multi-variable air pollution forecasting model, by combining weather and air quality variables. AirCast employs a multi-task head architecture that simultaneously forecasts atmospheric conditions and pollutant concentrations, improving its understanding of how weather patterns affect air quality. Predicting extreme pollution events is challenging due to their rare occurrence in historic data, resulting in a heavy-tailed distribution of pollution levels. To address this, we propose a novel Frequency-weighted Mean Absolute Error (fMAE) loss, adapted from the class-balanced loss for regression tasks. Informed from domain knowledge, we investigate the selection of key variables known to influence pollution levels. Additionally, we align existing weather and chemical datasets across spatial and temporal dimensions. AirCast's integrated approach, combining multi-task learning, frequency weighted loss and domain informed variable selection, enables more accurate pollution forecasts. Our source code and models are made public here (https://github.com/vishalned/AirCast.git) △ Less

Submitted 25 February, 2025; originally announced February 2025.

arXiv:2412.15190 [pdf, other]

EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues

Authors: Sagar Soni, Akshay Dudhane, Hiyam Debary, Mustansar Fiaz, Muhammad Akhtar Munir, Muhammad Sohail Danish, Paolo Fraccaro, Campbell D Watson, Levente J Klein, Fahad Shahbaz Khan, Salman Khan

Abstract: Automated analysis of vast Earth observation data via interactive Vision-Language Models (VLMs) can unlock new opportunities for environmental monitoring, disaster response, and {resource management}. Existing generic VLMs do not perform well on Remote Sensing data, while the recent Geo-spatial VLMs remain restricted to a fixed resolution and few sensor modalities. In this paper, we introduce Eart… ▽ More Automated analysis of vast Earth observation data via interactive Vision-Language Models (VLMs) can unlock new opportunities for environmental monitoring, disaster response, and {resource management}. Existing generic VLMs do not perform well on Remote Sensing data, while the recent Geo-spatial VLMs remain restricted to a fixed resolution and few sensor modalities. In this paper, we introduce EarthDial, a conversational assistant specifically designed for Earth Observation (EO) data, transforming complex, multi-sensory Earth observations into interactive, natural language dialogues. EarthDial supports multi-spectral, multi-temporal, and multi-resolution imagery, enabling a wide range of remote sensing tasks, including classification, detection, captioning, question answering, visual reasoning, and visual grounding. To achieve this, we introduce an extensive instruction tuning dataset comprising over 11.11M instruction pairs covering RGB, Synthetic Aperture Radar (SAR), and multispectral modalities such as Near-Infrared (NIR) and infrared. Furthermore, EarthDial handles bi-temporal and multi-temporal sequence analysis for applications like change detection. Our extensive experimental results on 44 downstream datasets demonstrate that EarthDial outperforms existing generic and domain-specific models, achieving better generalization across various EO tasks. Our source codes and pre-trained models are at https://github.com/hiyamdebary/EarthDial. △ Less

Submitted 7 April, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

arXiv:2412.10995 [pdf, other]

RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone

Authors: Mustafa Munir, Md Mostafijur Rahman, Radu Marculescu

Abstract: Vision transformers (ViTs) have dominated computer vision in recent years. However, ViTs are computationally expensive and not well suited for mobile devices; this led to the prevalence of convolutional neural network (CNN) and ViT-based hybrid models for mobile vision applications. Recently, Vision GNN (ViG) and CNN hybrid models have also been proposed for mobile vision tasks. However, all of th… ▽ More Vision transformers (ViTs) have dominated computer vision in recent years. However, ViTs are computationally expensive and not well suited for mobile devices; this led to the prevalence of convolutional neural network (CNN) and ViT-based hybrid models for mobile vision applications. Recently, Vision GNN (ViG) and CNN hybrid models have also been proposed for mobile vision tasks. However, all of these methods remain slower compared to pure CNN-based models. In this work, we propose Multi-Level Dilated Convolutions to devise a purely CNN-based mobile backbone. Using Multi-Level Dilated Convolutions allows for a larger theoretical receptive field than standard convolutions. Different levels of dilation also allow for interactions between the short-range and long-range features in an image. Experiments show that our proposed model outperforms state-of-the-art (SOTA) mobile CNN, ViT, ViG, and hybrid architectures in terms of accuracy and/or speed on image classification, object detection, instance segmentation, and semantic segmentation. Our fastest model, RapidNet-Ti, achieves 76.3\% top-1 accuracy on ImageNet-1K with 0.9 ms inference latency on an iPhone 13 mini NPU, which is faster and more accurate than MobileNetV2x1.4 (74.7\% top-1 with 1.0 ms latency). Our work shows that pure CNN architectures can beat SOTA hybrid and ViT models in terms of accuracy and speed when designed properly. △ Less

Submitted 14 December, 2024; originally announced December 2024.

Comments: Accepted in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025)

arXiv:2411.19325 [pdf, other]

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

Authors: Muhammad Sohail Danish, Muhammad Akhtar Munir, Syed Roshaan Ali Shah, Kartik Kuckreja, Fahad Shahbaz Khan, Paolo Fraccaro, Alexandre Lacoste, Salman Khan

Abstract: While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they do not effectively address the specific challenges of geospatial applications. Generic VLM benchmarks are not designed to handle the complexities of geospatial data, an essential component for applications such as environmental monitoring, urban planning, and disaster management. Key challenges in the… ▽ More While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they do not effectively address the specific challenges of geospatial applications. Generic VLM benchmarks are not designed to handle the complexities of geospatial data, an essential component for applications such as environmental monitoring, urban planning, and disaster management. Key challenges in the geospatial domain include temporal change detection, large-scale object counting, tiny object detection, and understanding relationships between entities in remote sensing imagery. To bridge this gap, we present GEOBench-VLM, a comprehensive benchmark specifically designed to evaluate VLMs on geospatial tasks, including scene understanding, object counting, localization, fine-grained categorization, segmentation, and temporal analysis. Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales. We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges. The results indicate that although existing VLMs demonstrate potential, they face challenges when dealing with geospatial-specific tasks, highlighting the room for further improvements. Notably, the best-performing LLaVa-OneVision achieves only 41.7% accuracy on MCQs, slightly more than GPT-4o, which is approximately double the random guess performance. Our benchmark is publicly available at https://github.com/The-AI-Alliance/GEO-Bench-VLM . △ Less

Submitted 12 March, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

Comments: This updated version includes revisions and additional analysis

arXiv:2410.12061 [pdf, other]

CrediRAG: Network-Augmented Credibility-Based Retrieval for Misinformation Detection in Reddit

Authors: Ashwin Ram, Yigit Ege Bayiz, Arash Amini, Mustafa Munir, Radu Marculescu

Abstract: Fake news threatens democracy and exacerbates the polarization and divisions in society; therefore, accurately detecting online misinformation is the foundation of addressing this issue. We present CrediRAG, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at… ▽ More Fake news threatens democracy and exacerbates the polarization and divisions in society; therefore, accurately detecting online misinformation is the foundation of addressing this issue. We present CrediRAG, the first fake news detection model that combines language models with access to a rich external political knowledge base with a dense social network to detect fake news across social media at scale. CrediRAG uses a news retriever to initially assign a misinformation score to each post based on the source credibility of similar news articles to the post title content. CrediRAG then improves the initial retrieval estimations through a novel weighted post-to-post network connected based on shared commenters and weighted by the average stance of all shared commenters across every pair of posts. We achieve 11% increase in the F1-score in detecting misinformative posts over state-of-the-art methods. Extensive experiments conducted on curated real-world Reddit data of over 200,000 posts demonstrate the superior performance of CrediRAG on existing baselines. Thus, our approach offers a more accurate and scalable solution to combat the spread of fake news across social media platforms. △ Less

Submitted 26 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

arXiv:2409.07585 [pdf, other]

Efficient Localized Adaptation of Neural Weather Forecasting: A Case Study in the MENA Region

Authors: Muhammad Akhtar Munir, Fahad Shahbaz Khan, Salman Khan

Abstract: Accurate weather and climate modeling is critical for both scientific advancement and safeguarding communities against environmental risks. Traditional approaches rely heavily on Numerical Weather Prediction (NWP) models, which simulate energy and matter flow across Earth's systems. However, heavy computational requirements and low efficiency restrict the suitability of NWP, leading to a pressing… ▽ More Accurate weather and climate modeling is critical for both scientific advancement and safeguarding communities against environmental risks. Traditional approaches rely heavily on Numerical Weather Prediction (NWP) models, which simulate energy and matter flow across Earth's systems. However, heavy computational requirements and low efficiency restrict the suitability of NWP, leading to a pressing need for enhanced modeling techniques. Neural network-based models have emerged as promising alternatives, leveraging data-driven approaches to forecast atmospheric variables. In this work, we focus on limited-area modeling and train our model specifically for localized region-level downstream tasks. As a case study, we consider the MENA region due to its unique climatic challenges, where accurate localized weather forecasting is crucial for managing water resources, agriculture and mitigating the impacts of extreme weather events. This targeted approach allows us to tailor the model's capabilities to the unique conditions of the region of interest. Our study aims to validate the effectiveness of integrating parameter-efficient fine-tuning (PEFT) methodologies, specifically Low-Rank Adaptation (LoRA) and its variants, to enhance forecast accuracy, as well as training speed, computational resource utilization, and memory efficiency in weather and climate modeling for specific regions. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: Our codebase and pre-trained models can be accessed at: [this url](https://github.com/akhtarvision/weather-regional)

arXiv:2406.05850 [pdf, other]

Scaling Graph Convolutions for Mobile Vision

Authors: William Avery, Mustafa Munir, Radu Marculescu

Abstract: To compete with existing mobile architectures, MobileViG introduces Sparse Vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling p… ▽ More To compete with existing mobile architectures, MobileViG introduces Sparse Vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, MobileViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 $AP^{box}$ and 0.7 $AP^{mask}$, and MobileViGv2-B outperforms MobileViG-B by 1.0 $AP^{box}$ and 0.7 $AP^{mask}$. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% $mIoU$ and MobileViGv2-B achieves 44.3% $mIoU$. Our code can be found at \url{https://github.com/SLDGroup/MobileViGv2}. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

arXiv:2406.04873 [pdf, other]

Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior

Authors: Tanvir Mahmud, Mustafa Munir, Radu Marculescu, Diana Marculescu

Abstract: Video-to-video synthesis poses significant challenges in maintaining character consistency, smooth temporal transitions, and preserving visual quality during fast motion. While recent fully cross-frame self-attention mechanisms have improved character consistency across multiple frames, they come with high computational costs and often include redundant operations, especially for videos with highe… ▽ More Video-to-video synthesis poses significant challenges in maintaining character consistency, smooth temporal transitions, and preserving visual quality during fast motion. While recent fully cross-frame self-attention mechanisms have improved character consistency across multiple frames, they come with high computational costs and often include redundant operations, especially for videos with higher frame rates. To address these inefficiencies, we propose an adaptive motion-guided cross-frame attention mechanism that selectively reduces redundant computations. This enables a greater number of cross-frame attentions over more frames within the same computational budget, thereby enhancing both video quality and temporal coherence. Our method leverages optical flow to focus on moving regions while sparsely attending to stationary areas, allowing for the joint editing of more frames without increasing computational demands. Traditional frame interpolation techniques struggle with motion blur and flickering in intermediate frames, which compromises visual fidelity. To mitigate this, we introduce KV-caching for jointly edited frames, reusing keys and values across intermediate frames to preserve visual quality and maintain temporal consistency throughout the video. With our adaptive cross-frame self-attention approach, we achieve a threefold increase in the number of keyframes processed compared to existing methods, all within the same computational budget as fully cross-frame attention baselines. This results in significant improvements in prediction accuracy and temporal consistency, outperforming state-of-the-art approaches. Code will be made publicly available at https://github.com/tanvir-utexas/AdaVE/tree/main △ Less

Submitted 10 November, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted in WACV 2025. Project page: https://tanvir-utexas.github.io/AdaVE_Demo/

arXiv:2405.16740 [pdf, other]

PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation

Authors: Md Mostafijur Rahman, Mustafa Munir, Debesh Jha, Ulas Bagci, Radu Marculescu

Abstract: The Segment Anything Model (SAM), originally designed for general-purpose segmentation tasks, has been used recently for polyp segmentation. Nonetheless, fine-tuning SAM with data from new imaging centers or clinics poses significant challenges. This is because this necessitates the creation of an expensive and time-intensive annotated dataset, along with the potential for variability in user prom… ▽ More The Segment Anything Model (SAM), originally designed for general-purpose segmentation tasks, has been used recently for polyp segmentation. Nonetheless, fine-tuning SAM with data from new imaging centers or clinics poses significant challenges. This is because this necessitates the creation of an expensive and time-intensive annotated dataset, along with the potential for variability in user prompts during inference. To address these issues, we propose a robust fine-tuning technique, PP-SAM, that allows SAM to adapt to the polyp segmentation task with limited images. To this end, we utilize variable perturbed bounding box prompts (BBP) to enrich the learning context and enhance the model's robustness to BBP perturbations during inference. Rigorous experiments on polyp segmentation benchmarks reveal that our variable BBP perturbation significantly improves model resilience. Notably, on Kvasir, 1-shot fine-tuning boosts the DICE score by 20% and 37% with 50 and 100-pixel BBP perturbations during inference, respectively. Moreover, our experiments show that 1-shot, 5-shot, and 10-shot PP-SAM with 50-pixel perturbations during inference outperform a recent state-of-the-art (SOTA) polyp segmentation method by 26%, 7%, and 5% DICE scores, respectively. Our results motivate the broader applicability of our PP-SAM for other medical imaging tasks with limited samples. Our implementation is available at https://github.com/SLDGroup/PP-SAM. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 7 pages, 9 figures, Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

arXiv:2405.14497 [pdf, other]

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Authors: Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir, M. Saquib Sarfraz, Mohsen Ali

Abstract: In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set o… ▽ More In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set of augmentations, a base detector can outperform existing methods for single domain generalization by a good margin. This highlights the importance of domain diversification in improving the performance of object detectors. Secondly, we introduce a method to align detections from multiple views, considering both classification and localization outputs. This alignment procedure leads to better generalized and well-calibrated object detector models, which are crucial for accurate decision-making in safety-critical applications. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors. To validate the effectiveness of our proposed methods, we conduct extensive experiments and ablations on challenging domain-shift scenarios. The results consistently demonstrate the superiority of our approach compared to existing methods. Our code and models are available at: https://github.com/msohaildanish/DivAlign △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.06880 [pdf, other]

EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation

Authors: Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu

Abstract: An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficienc… ▽ More An efficient and effective decoding mechanism is crucial in medical image segmentation, especially in scenarios with limited computational resources. However, these decoding mechanisms usually come with high computational costs. To address this concern, we introduce EMCAD, a new efficient multi-scale convolutional attention decoder, designed to optimize both performance and computational efficiency. EMCAD leverages a unique multi-scale depth-wise convolution block, significantly enhancing feature maps through multi-scale convolutions. EMCAD also employs channel, spatial, and grouped (large-kernel) gated attention mechanisms, which are highly effective at capturing intricate spatial relationships while focusing on salient regions. By employing group and depth-wise convolution, EMCAD is very efficient and scales well (e.g., only 1.91M parameters and 0.381G FLOPs are needed when using a standard encoder). Our rigorous evaluations across 12 datasets that belong to six medical image segmentation tasks reveal that EMCAD achieves state-of-the-art (SOTA) performance with 79.4% and 80.3% reduction in #Params and #FLOPs, respectively. Moreover, EMCAD's adaptability to different encoders and versatility across segmentation tasks further establish EMCAD as a promising tool, advancing the field towards more efficient and accurate medical image analysis. Our implementation is available at https://github.com/SLDGroup/EMCAD. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 14 pages, 5 figures, 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2405.06849 [pdf, other]

GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs

Authors: Mustafa Munir, William Avery, Md Mostafijur Rahman, Radu Marculescu

Abstract: Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. To solve this issue, we propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN as it limits the number of considered graph connections made… ▽ More Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. To solve this issue, we propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN as it limits the number of considered graph connections made within an image. Additionally, we propose a novel CNN-GNN architecture, GreedyViG, which uses DAGC. Extensive experiments show that GreedyViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification, object detection, instance segmentation, and semantic segmentation tasks. Our smallest model, GreedyViG-S, achieves 81.1% top-1 accuracy on ImageNet-1K, 2.9% higher than Vision GNN and 2.2% higher than Vision HyperGraph Neural Network (ViHGNN), with less GMACs and a similar number of parameters. Our largest model, GreedyViG-B obtains 83.9% top-1 accuracy, 0.2% higher than Vision GNN, with a 66.6% decrease in parameters and a 69% decrease in GMACs. GreedyViG-B also obtains the same accuracy as ViHGNN with a 67.3% decrease in parameters and a 71.3% decrease in GMACs. Our work shows that hybrid CNN-GNN architectures not only provide a new avenue for designing efficient models, but that they can also exceed the performance of current state-of-the-art models. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2403.06388 [pdf, other]

A Zero Trust Framework for Realization and Defense Against Generative AI Attacks in Power Grid

Authors: Md. Shirajum Munir, Sravanthi Proddatoori, Manjushree Muralidhara, Walid Saad, Zhu Han, Sachin Shetty

Abstract: Understanding the potential of generative AI (GenAI)-based attacks on the power grid is a fundamental challenge that must be addressed in order to protect the power grid by realizing and validating risk in new attack vectors. In this paper, a novel zero trust framework for a power grid supply chain (PGSC) is proposed. This framework facilitates early detection of potential GenAI-driven attack vect… ▽ More Understanding the potential of generative AI (GenAI)-based attacks on the power grid is a fundamental challenge that must be addressed in order to protect the power grid by realizing and validating risk in new attack vectors. In this paper, a novel zero trust framework for a power grid supply chain (PGSC) is proposed. This framework facilitates early detection of potential GenAI-driven attack vectors (e.g., replay and protocol-type attacks), assessment of tail risk-based stability measures, and mitigation of such threats. First, a new zero trust system model of PGSC is designed and formulated as a zero-trust problem that seeks to guarantee for a stable PGSC by realizing and defending against GenAI-driven cyber attacks. Second, in which a domain-specific generative adversarial networks (GAN)-based attack generation mechanism is developed to create a new vulnerability cyberspace for further understanding that threat. Third, tail-based risk realization metrics are developed and implemented for quantifying the extreme risk of a potential attack while leveraging a trust measurement approach for continuous validation. Fourth, an ensemble learning-based bootstrap aggregation scheme is devised to detect the attacks that are generating synthetic identities with convincing user and distributed energy resources device profiles. Experimental results show the efficacy of the proposed zero trust framework that achieves an accuracy of 95.7% on attack vector generation, a risk measure of 9.61% for a 95% stable PGSC, and a 99% confidence in defense against GenAI-driven attack. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2311.04815 [pdf, other]

Domain Adaptive Object Detection via Balancing Between Self-Training and Adversarial Learning

Authors: Muhammad Akhtar Munir, Muhammad Haris Khan, M. Saquib Sarfraz, Mohsen Ali

Abstract: Deep learning based object detectors struggle generalizing to a new target domain bearing significant variations in object and background. Most current methods align domains by using image or instance-level adversarial feature alignment. This often suffers due to unwanted background and lacks class-specific alignment. A straightforward approach to promote class-level alignment is to use high confi… ▽ More Deep learning based object detectors struggle generalizing to a new target domain bearing significant variations in object and background. Most current methods align domains by using image or instance-level adversarial feature alignment. This often suffers due to unwanted background and lacks class-specific alignment. A straightforward approach to promote class-level alignment is to use high confidence predictions on unlabeled domain as pseudo-labels. These predictions are often noisy since model is poorly calibrated under domain shift. In this paper, we propose to leverage model's predictive uncertainty to strike the right balance between adversarial feature alignment and class-level alignment. We develop a technique to quantify predictive uncertainty on class assignments and bounding-box predictions. Model predictions with low uncertainty are used to generate pseudo-labels for self-training, whereas the ones with higher uncertainty are used to generate tiles for adversarial feature alignment. This synergy between tiling around uncertain object regions and generating pseudo-labels from highly certain object regions allows capturing both image and instance-level context during the model adaptation. We report thorough ablation study to reveal the impact of different components in our approach. Results on five diverse and challenging adaptation scenarios show that our approach outperforms existing state-of-the-art methods with noticeable margins. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 45, Issue: 12, December 2023); Extended version of our conference paper, arXiv link: arXiv:2110.00249

arXiv:2311.03570 [pdf, other]

Cal-DETR: Calibrated Detection Transformer

Authors: Muhammad Akhtar Munir, Salman Khan, Muhammad Haris Khan, Mohsen Ali, Fahad Shahbaz Khan

Abstract: Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little at… ▽ More Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted at NeurIPS 2023

arXiv:2310.09021 [pdf, other]

Generative AI-driven Semantic Communication Framework for NextG Wireless Network

Authors: Avi Deb Raha, Md. Shirajum Munir, Apurba Adhikary, Yu Qiao, Choong Seon Hong

Abstract: This work designs a novel semantic communication (SemCom) framework for the next-generation wireless network to tackle the challenges of unnecessary transmission of vast amounts that cause high bandwidth consumption, more latency, and experience with bad quality of services (QoS). In particular, these challenges hinder applications like intelligent transportation systems (ITS), metaverse, mixed re… ▽ More This work designs a novel semantic communication (SemCom) framework for the next-generation wireless network to tackle the challenges of unnecessary transmission of vast amounts that cause high bandwidth consumption, more latency, and experience with bad quality of services (QoS). In particular, these challenges hinder applications like intelligent transportation systems (ITS), metaverse, mixed reality, and the Internet of Everything, where real-time and efficient data transmission is paramount. Therefore, to reduce communication overhead and maintain the QoS of emerging applications such as metaverse, ITS, and digital twin creation, this work proposes a novel semantic communication framework. First, an intelligent semantic transmitter is designed to capture the meaningful information (e.g., the rode-side image in ITS) by designing a domain-specific Mobile Segment Anything Model (MSAM)-based mechanism to reduce the potential communication traffic while QoS remains intact. Second, the concept of generative AI is introduced for building the SemCom to reconstruct and denoise the received semantic data frame at the receiver end. In particular, the Generative Adversarial Network (GAN) mechanism is designed to maintain a superior quality reconstruction under different signal-to-noise (SNR) channel conditions. Finally, we have tested and evaluated the proposed semantic communication (SemCom) framework with the real-world 6G scenario of ITS; in particular, the base station equipped with an RGB camera and a mmWave phased array. Experimental results demonstrate the efficacy of the proposed SemCom framework by achieving high-quality reconstruction across various SNR channel conditions, resulting in 93.45% data reduction in communication. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.09236 [pdf, other]

Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures

Authors: Arif Mahmood, Abdul Basit, M. Akhtar Munir, Mohsen Ali

Abstract: Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction informat… ▽ More Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction information, which provides valuable clues for localizing firearm carriers. Our approach incorporates an attention mechanism that effectively distinguishes humans and firearms from the background by focusing on relevant areas. Additionally, we introduce a saliency-driven locality-preserving constraint to learn essential features while preserving foreground information in the input image. By combining these components, our approach achieves exceptional results on a newly proposed dataset. To handle inputs of varying sizes, we pass paired human-firearm instances with attention masks as channels through a deep network for feature computation, utilizing an adaptive average pooling layer. We extensively evaluate our approach against existing methods in human-object interaction detection and achieve significant results (AP=77.8\%) compared to the baseline approach (AP=63.1\%). This demonstrates the effectiveness of leveraging attention mechanisms and saliency-driven locality preservation for accurate human-firearm interaction detection. Our findings contribute to advancing the fields of security and surveillance, enabling more efficient firearm localization and identification in diverse scenarios. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: This paper is accepted in IEEE Transactions on Computational Social Systems

arXiv:2307.00395 [pdf, other]

MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications

Authors: Mustafa Munir, William Avery, Radu Marculescu

Abstract: Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have dominated computer vision. However, recently proposed vision graph neural networks (ViG) provide a new avenue for exploration. Unfortunately, for mobile applications, ViGs are computationally expensive due to the overhead of representing images as graph structures. In this work, we propose a new graph-based spars… ▽ More Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have dominated computer vision. However, recently proposed vision graph neural networks (ViG) provide a new avenue for exploration. Unfortunately, for mobile applications, ViGs are computationally expensive due to the overhead of representing images as graph structures. In this work, we propose a new graph-based sparse attention mechanism, Sparse Vision Graph Attention (SVGA), that is designed for ViGs running on mobile devices. Additionally, we propose the first hybrid CNN-GNN architecture for vision tasks on mobile devices, MobileViG, which uses SVGA. Extensive experiments show that MobileViG beats existing ViG models and existing mobile CNN and ViT architectures in terms of accuracy and/or speed on image classification, object detection, and instance segmentation tasks. Our fastest model, MobileViG-Ti, achieves 75.7% top-1 accuracy on ImageNet-1K with 0.78 ms inference latency on iPhone 13 Mini NPU (compiled with CoreML), which is faster than MobileNetV2x1.4 (1.02 ms, 74.7% top-1) and MobileNetV2x1.0 (0.81 ms, 71.8% top-1). Our largest model, MobileViG-B obtains 82.6% top-1 accuracy with only 2.30 ms latency, which is faster and more accurate than the similarly sized EfficientFormer-L3 model (2.77 ms, 82.4%). Our work proves that well designed hybrid CNN-GNN architectures can be a new avenue of exploration for designing models that are extremely fast and accurate on mobile devices. Our code is publicly available at https://github.com/SLDGroup/MobileViG. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

arXiv:2306.07993 [pdf, other]

Trustworthy Artificial Intelligence Framework for Proactive Detection and Risk Explanation of Cyber Attacks in Smart Grid

Authors: Md. Shirajum Munir, Sachin Shetty, Danda B. Rawat

Abstract: The rapid growth of distributed energy resources (DERs), such as renewable energy sources, generators, consumers, and prosumers in the smart grid infrastructure, poses significant cybersecurity and trust challenges to the grid controller. Consequently, it is crucial to identify adversarial tactics and measure the strength of the attacker's DER. To enable a trustworthy smart grid controller, this w… ▽ More The rapid growth of distributed energy resources (DERs), such as renewable energy sources, generators, consumers, and prosumers in the smart grid infrastructure, poses significant cybersecurity and trust challenges to the grid controller. Consequently, it is crucial to identify adversarial tactics and measure the strength of the attacker's DER. To enable a trustworthy smart grid controller, this work investigates a trustworthy artificial intelligence (AI) mechanism for proactive identification and explanation of the cyber risk caused by the control/status message of DERs. Thus, proposing and developing a trustworthy AI framework to facilitate the deployment of any AI algorithms for detecting potential cyber threats and analyzing root causes based on Shapley value interpretation while dynamically quantifying the risk of an attack based on Ward's minimum variance formula. The experiment with a state-of-the-art dataset establishes the proposed framework as a trustworthy AI by fulfilling the capabilities of reliability, fairness, explainability, transparency, reproducibility, and accountability. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Submitted for peer review

arXiv:2304.01950 [pdf, other]

MP-FedCL: Multiprototype Federated Contrastive Learning for Edge Intelligence

Authors: Yu Qiao, Md. Shirajum Munir, Apurba Adhikary, Huy Q. Le, Avi Deb Raha, Chaoning Zhang, Choong Seon Hong

Abstract: Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a… ▽ More Federated learning-assisted edge intelligence enables privacy protection in modern intelligent services. However, not independent and identically distributed (non-IID) distribution among edge clients can impair the local model performance. The existing single prototype-based strategy represents a class by using the mean of the feature space. However, feature spaces are usually not clustered, and a single prototype may not represent a class well. Motivated by this, this paper proposes a multi-prototype federated contrastive learning approach (MP-FedCL) which demonstrates the effectiveness of using a multi-prototype strategy over a single-prototype under non-IID settings, including both label and feature skewness. Specifically, a multi-prototype computation strategy based on \textit{k-means} is first proposed to capture different embedding representations for each class space, using multiple prototypes ($k$ centroids) to represent a class in the embedding space. In each global round, the computed multiple prototypes and their respective model parameters are sent to the edge server for aggregation into a global prototype pool, which is then sent back to all clients to guide their local training. Finally, local training for each client minimizes their own supervised learning tasks and learns from shared prototypes in the global prototype pool through supervised contrastive learning, which encourages them to learn knowledge related to their own class from others and reduces the absorption of unrelated knowledge in each global iteration. Experimental results on MNIST, Digit-5, Office-10, and DomainNet show that our method outperforms multiple baselines, with an average test accuracy improvement of about 4.6\% and 10.4\% under feature and label non-IID distributions, respectively. △ Less

Submitted 11 October, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

Comments: Accepted by IEEE Internet of Things

arXiv:2303.14404 [pdf, other]

Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection

Authors: Muhammad Akhtar Munir, Muhammad Haris Khan, Salman Khan, Fahad Shahbaz Khan

Abstract: Deep neural networks (DNNs) have enabled astounding progress in several vision-based problems. Despite showing high predictive accuracy, recently, several works have revealed that they tend to provide overconfident predictions and thus are poorly calibrated. The majority of the works addressing the miscalibration of DNNs fall under the scope of classification and consider only in-domain prediction… ▽ More Deep neural networks (DNNs) have enabled astounding progress in several vision-based problems. Despite showing high predictive accuracy, recently, several works have revealed that they tend to provide overconfident predictions and thus are poorly calibrated. The majority of the works addressing the miscalibration of DNNs fall under the scope of classification and consider only in-domain predictions. However, there is little to no progress in studying the calibration of DNN-based object detection models, which are central to many vision-based safety-critical applications. In this paper, inspired by the train-time calibration methods, we propose a novel auxiliary loss formulation that explicitly aims to align the class confidence of bounding boxes with the accurateness of predictions (i.e. precision). Since the original formulation of our loss depends on the counts of true positives and false positives in a minibatch, we develop a differentiable proxy of our loss that can be used during training with other application-specific loss functions. We perform extensive experiments on challenging in-domain and out-domain scenarios with six benchmark datasets including MS-COCO, Cityscapes, Sim10k, and BDD100k. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios. Our source code and pre-trained models are available at https://github.com/akhtarvision/bpc_calibration △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: Accepted at CVPR 2023

arXiv:2210.06649 [pdf, other]

Neuro-symbolic Explainable Artificial Intelligence Twin for Zero-touch IoE in Wireless Network

Authors: Md. Shirajum Munir, Ki Tae Kim, Apurba Adhikary, Walid Saad, Sachin Shetty, Seong-Bae Park, Choong Seon Hong

Abstract: Explainable artificial intelligence (XAI) twin systems will be a fundamental enabler of zero-touch network and service management (ZSM) for sixth-generation (6G) wireless networks. A reliable XAI twin system for ZSM requires two composites: an extreme analytical ability for discretizing the physical behavior of the Internet of Everything (IoE) and rigorous methods for characterizing the reasoning… ▽ More Explainable artificial intelligence (XAI) twin systems will be a fundamental enabler of zero-touch network and service management (ZSM) for sixth-generation (6G) wireless networks. A reliable XAI twin system for ZSM requires two composites: an extreme analytical ability for discretizing the physical behavior of the Internet of Everything (IoE) and rigorous methods for characterizing the reasoning of such behavior. In this paper, a novel neuro-symbolic explainable artificial intelligence twin framework is proposed to enable trustworthy ZSM for a wireless IoE. The physical space of the XAI twin executes a neural-network-driven multivariate regression to capture the time-dependent wireless IoE environment while determining unconscious decisions of IoE service aggregation. Subsequently, the virtual space of the XAI twin constructs a directed acyclic graph (DAG)-based Bayesian network that can infer a symbolic reasoning score over unconscious decisions through a first-order probabilistic language model. Furthermore, a Bayesian multi-arm bandits-based learning problem is proposed for reducing the gap between the expected explained score and the current obtained score of the proposed neuro-symbolic XAI twin. To address the challenges of extensible, modular, and stateless management functions in ZSM, the proposed neuro-symbolic XAI twin framework consists of two learning systems: 1) an implicit learner that acts as an unconscious learner in physical space, and 2) an explicit leaner that can exploit symbolic reasoning based on implicit learner decisions and prior evidence. Experimental results show that the proposed neuro-symbolic XAI twin can achieve around 96.26% accuracy while guaranteeing from 18% to 44% more trust score in terms of reasoning and closed-loop automation. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Submitted to a journal for peer review

arXiv:2209.07601 [pdf, other]

Towards Improving Calibration in Object Detection Under Domain Shift

Authors: Muhammad Akhtar Munir, Muhammad Haris Khan, M. Saquib Sarfraz, Mohsen Ali

Abstract: With deep neural network based solution more readily being incorporated in real-world applications, it has been pressing requirement that predictions by such models, especially in safety-critical environments, be highly accurate and well-calibrated. Although some techniques addressing DNN calibration have been proposed, they are only limited to visual classification applications and in-domain pred… ▽ More With deep neural network based solution more readily being incorporated in real-world applications, it has been pressing requirement that predictions by such models, especially in safety-critical environments, be highly accurate and well-calibrated. Although some techniques addressing DNN calibration have been proposed, they are only limited to visual classification applications and in-domain predictions. Unfortunately, very little to no attention is paid towards addressing calibration of DNN-based visual object detectors, that occupy similar space and importance in many decision making systems as their visual classification counterparts. In this work, we study the calibration of DNN-based object detection models, particularly under domain shift. To this end, we first propose a new, plug-and-play, train-time calibration loss for object detection (coined as TCD). It can be used with various application-specific loss functions as an auxiliary loss function to improve detection calibration. Second, we devise a new implicit technique for improving calibration in self-training based domain adaptive detectors, featuring a new uncertainty quantification mechanism for object detection. We demonstrate TCD is capable of enhancing calibration with notable margins (1) across different DNN-based object detection paradigms both in in-domain and out-of-domain predictions, and (2) in different domain-adaptive detectors across challenging adaptation scenarios. Finally, we empirically show that our implicit calibration technique can be used in tandem with TCD during adaptation to further boost calibration in diverse domain shift scenarios. △ Less

Submitted 29 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: To appear in NeurIPS 2022

arXiv:2205.04712 [pdf, other]

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Authors: Julian Wörmann, Daniel Bogdoll, Christian Brunner, Etienne Bührle, Han Chen, Evaristus Fuh Chuo, Kostadin Cvejoski, Ludger van Elst, Philip Gottschall, Stefan Griesche, Christian Hellert, Christian Hesels, Sebastian Houben, Tim Joseph, Niklas Keil, Johann Kelsch, Mert Keser, Hendrik Königshof, Erwin Kraft, Leonie Kreuser, Kevin Krone, Tobias Latka, Denny Mattern, Stefan Matthes, Franz Motzkus , et al. (27 additional authors not shown)

Abstract: The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical con… ▽ More The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving. △ Less

Submitted 20 November, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: 111 pages, Added section on Run-time Network Verification

arXiv:2203.02331 [pdf, other]

F2DNet: Fast Focal Detection Network for Pedestrian Detection

Authors: Abdul Hannan Khan, Mohsin Munir, Ludger van Elst, Andreas Dengel

Abstract: Two-stage detectors are state-of-the-art in object detection as well as pedestrian detection. However, the current two-stage detectors are inefficient as they do bounding box regression in multiple steps i.e. in region proposal networks and bounding box heads. Also, the anchor-based region proposal networks are computationally expensive to train. We propose F2DNet, a novel two-stage detection arch… ▽ More Two-stage detectors are state-of-the-art in object detection as well as pedestrian detection. However, the current two-stage detectors are inefficient as they do bounding box regression in multiple steps i.e. in region proposal networks and bounding box heads. Also, the anchor-based region proposal networks are computationally expensive to train. We propose F2DNet, a novel two-stage detection architecture which eliminates redundancy of current two-stage detectors by replacing the region proposal network with our focal detection network and bounding box head with our fast suppression head. We benchmark F2DNet on top pedestrian detection datasets, thoroughly compare it against the existing state-of-the-art detectors and conduct cross dataset evaluation to test the generalizability of our model to unseen data. Our F2DNet achieves 8.7\%, 2.2\%, and 6.1\% MR-2 on City Persons, Caltech Pedestrian, and Euro City Person datasets respectively when trained on a single dataset and reaches 20.4\% and 26.2\% MR-2 in heavy occlusion setting of Caltech Pedestrian and City Persons datasets when using progressive fine-tunning. Furthermore, F2DNet have significantly lesser inference time compared to the current state-of-the-art. Code and trained models will be available at https://github.com/AbdulHannanKhan/F2DNet. △ Less

Submitted 23 September, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: Accepted at ICPR 2022

ACM Class: I.2.10; I.4.8; I.5.4

arXiv:2201.10822 [pdf, other]

An Explainable Artificial Intelligence Framework for Quality-Aware IoE Service Delivery

Authors: Md. Shirajum Munir, Seong-Bae Park, Choong Seon Hong

Abstract: One of the core envisions of the sixth-generation (6G) wireless networks is to accumulate artificial intelligence (AI) for autonomous controlling of the Internet of Everything (IoE). Particularly, the quality of IoE services delivery must be maintained by analyzing contextual metrics of IoE such as people, data, process, and things. However, the challenges incorporate when the AI model conceives a… ▽ More One of the core envisions of the sixth-generation (6G) wireless networks is to accumulate artificial intelligence (AI) for autonomous controlling of the Internet of Everything (IoE). Particularly, the quality of IoE services delivery must be maintained by analyzing contextual metrics of IoE such as people, data, process, and things. However, the challenges incorporate when the AI model conceives a lake of interpretation and intuition to the network service provider. Therefore, this paper provides an explainable artificial intelligence (XAI) framework for quality-aware IoE service delivery that enables both intelligence and interpretation. First, a problem of quality-aware IoE service delivery is formulated by taking into account network dynamics and contextual metrics of IoE, where the objective is to maximize the channel quality index (CQI) of each IoE service user. Second, a regression problem is devised to solve the formulated problem, where explainable coefficients of the contextual matrices are estimated by Shapley value interpretation. Third, the XAI-enabled quality-aware IoE service delivery algorithm is implemented by employing ensemble-based regression models for ensuring the interpretation of contextual relationships among the matrices to reconfigure network parameters. Finally, the experiment results show that the uplink improvement rate becomes 42.43% and 16.32% for the AdaBoost and Extra Trees, respectively, while the downlink improvement rate reaches up to 28.57% and 14.29%. However, the AdaBoost-based approach cannot maintain the CQI of IoE service users. Therefore, the proposed Extra Trees-based regression model shows significant performance gain for mitigating the trade-off between accuracy and interpretability than other baselines. △ Less

Submitted 26 January, 2022; originally announced January 2022.

arXiv:2111.14838 [pdf]

doi 10.1109/TII.2021.3124476

Evaluating Privacy-Preserving Machine Learning in Critical Infrastructures: A Case Study on Time-Series Classification

Authors: Dominique Mercier, Adriano Lucieri, Mohsin Munir, Andreas Dengel, Sheraz Ahmed

Abstract: With the advent of machine learning in applications of critical infrastructure such as healthcare and energy, privacy is a growing concern in the minds of stakeholders. It is pivotal to ensure that neither the model nor the data can be used to extract sensitive information used by attackers against individuals or to harm whole societies through the exploitation of critical infrastructure. The appl… ▽ More With the advent of machine learning in applications of critical infrastructure such as healthcare and energy, privacy is a growing concern in the minds of stakeholders. It is pivotal to ensure that neither the model nor the data can be used to extract sensitive information used by attackers against individuals or to harm whole societies through the exploitation of critical infrastructure. The applicability of machine learning in these domains is mostly limited due to a lack of trust regarding the transparency and the privacy constraints. Various safety-critical use cases (mostly relying on time-series data) are currently underrepresented in privacy-related considerations. By evaluating several privacy-preserving methods regarding their applicability on time-series data, we validated the inefficacy of encryption for deep learning, the strong dataset dependence of differential privacy, and the broad applicability of federated methods. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 9 pages, 4 figures. 6 tables

arXiv:2110.14205 [pdf, other]

FedPrune: Towards Inclusive Federated Learning

Authors: Muhammad Tahir Munir, Muhammad Mustansar Saeed, Mahad Ali, Zafar Ayyub Qazi, Ihsan Ayyub Qazi

Abstract: Federated learning (FL) is a distributed learning technique that trains a shared model over distributed data in a privacy-preserving manner. Unfortunately, FL's performance degrades when there is (i) variability in client characteristics in terms of computational and memory resources (system heterogeneity) and (ii) non-IID data distribution across clients (statistical heterogeneity). For example,… ▽ More Federated learning (FL) is a distributed learning technique that trains a shared model over distributed data in a privacy-preserving manner. Unfortunately, FL's performance degrades when there is (i) variability in client characteristics in terms of computational and memory resources (system heterogeneity) and (ii) non-IID data distribution across clients (statistical heterogeneity). For example, slow clients get dropped in FL schemes, such as Federated Averaging (FedAvg), which not only limits overall learning but also biases results towards fast clients. We propose FedPrune; a system that tackles this challenge by pruning the global model for slow clients based on their device characteristics. By doing so, slow clients can train a small model quickly and participate in FL which increases test accuracy as well as fairness. By using insights from Central Limit Theorem, FedPrune incorporates a new aggregation technique that achieves robust performance over non-IID data. Experimental evaluation shows that Fed- Prune provides robust convergence and better fairness compared to Federated Averaging. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2110.00249 [pdf, other]

Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection

Authors: Muhammad Akhtar Munir, Muhammad Haris Khan, M. Saquib Sarfraz, Mohsen Ali

Abstract: We study adapting trained object detectors to unseen domains manifesting significant variations of object appearance, viewpoints and backgrounds. Most current methods align domains by either using image or instance-level feature alignment in an adversarial fashion. This often suffers due to the presence of unwanted background and as such lacks class-specific alignment. A common remedy to promote c… ▽ More We study adapting trained object detectors to unseen domains manifesting significant variations of object appearance, viewpoints and backgrounds. Most current methods align domains by either using image or instance-level feature alignment in an adversarial fashion. This often suffers due to the presence of unwanted background and as such lacks class-specific alignment. A common remedy to promote class-level alignment is to use high confidence predictions on the unlabelled domain as pseudo labels. These high confidence predictions are often fallacious since the model is poorly calibrated under domain shift. In this paper, we propose to leverage model predictive uncertainty to strike the right balance between adversarial feature alignment and class-level alignment. Specifically, we measure predictive uncertainty on class assignments and the bounding box predictions. Model predictions with low uncertainty are used to generate pseudo-labels for self-supervision, whereas the ones with higher uncertainty are used to generate tiles for an adversarial feature alignment stage. This synergy between tiling around the uncertain object regions and generating pseudo-labels from highly certain object regions allows us to capture both the image and instance level context during the model adaptation stage. We perform extensive experiments covering various domain shift scenarios. Our approach improves upon existing state-of-the-art methods with visible margins. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: To appear in NeurIPS2021

arXiv:2108.01466 [pdf, other]

doi 10.1109/JIOT.2022.3149038

Risk Adversarial Learning System for Connected and Autonomous Vehicle Charging

Authors: Md. Shirajum Munir, Ki Tae Kim, Kyi Thar, Dusit Niyato, Choong Seon Hong

Abstract: In this paper, the design of a rational decision support system (RDSS) for a connected and autonomous vehicle charging infrastructure (CAV-CI) is studied. In the considered CAV-CI, the distribution system operator (DSO) deploys electric vehicle supply equipment (EVSE) to provide an EV charging facility for human-driven connected vehicles (CVs) and autonomous vehicles (AVs). The charging request by… ▽ More In this paper, the design of a rational decision support system (RDSS) for a connected and autonomous vehicle charging infrastructure (CAV-CI) is studied. In the considered CAV-CI, the distribution system operator (DSO) deploys electric vehicle supply equipment (EVSE) to provide an EV charging facility for human-driven connected vehicles (CVs) and autonomous vehicles (AVs). The charging request by the human-driven EV becomes irrational when it demands more energy and charging period than its actual need. Therefore, the scheduling policy of each EVSE must be adaptively accumulated the irrational charging request to satisfy the charging demand of both CVs and AVs. To tackle this, we formulate an RDSS problem for the DSO, where the objective is to maximize the charging capacity utilization by satisfying the laxity risk of the DSO. Thus, we devise a rational reward maximization problem to adapt the irrational behavior by CVs in a data-informed manner. We propose a novel risk adversarial multi-agent learning system (RAMALS) for CAV-CI to solve the formulated RDSS problem. In RAMALS, the DSO acts as a centralized risk adversarial agent (RAA) for informing the laxity risk to each EVSE. Subsequently, each EVSE plays the role of a self-learner agent to adaptively schedule its own EV sessions by coping advice from RAA. Experiment results show that the proposed RAMALS affords around 46.6% improvement in charging rate, about 28.6% improvement in the EVSE's active charging time and at least 33.3% more energy utilization, as compared to a currently deployed ACN EVSE system, and other baselines. △ Less

Submitted 2 February, 2022; v1 submitted 1 August, 2021; originally announced August 2021.

Comments: Accepted Article By IEEE Internet of Things Journal, DOI:10.1109/JIOT.2022.3149038 (In Press)

arXiv:2105.06677 [pdf, other]

XAI Handbook: Towards a Unified Framework for Explainable AI

Authors: Sebastian Palacio, Adriano Lucieri, Mohsin Munir, Jörn Hees, Sheraz Ahmed, Andreas Dengel

Abstract: The field of explainable AI (XAI) has quickly become a thriving and prolific community. However, a silent, recurrent and acknowledged issue in this area is the lack of consensus regarding its terminology. In particular, each new contribution seems to rely on its own (and often intuitive) version of terms like "explanation" and "interpretation". Such disarray encumbers the consolidation of advances… ▽ More The field of explainable AI (XAI) has quickly become a thriving and prolific community. However, a silent, recurrent and acknowledged issue in this area is the lack of consensus regarding its terminology. In particular, each new contribution seems to rely on its own (and often intuitive) version of terms like "explanation" and "interpretation". Such disarray encumbers the consolidation of advances in the field towards the fulfillment of scientific and regulatory demands e.g., when comparing methods or establishing their compliance with respect to biases and fairness constraints. We propose a theoretical framework that not only provides concrete definitions for these terms, but it also outlines all steps necessary to produce explanations and interpretations. The framework also allows for existing contributions to be re-contextualized such that their scope can be measured, thus making them comparable to other methods. We show that this framework is compliant with desiderata on explanations, on interpretability and on evaluation metrics. We present a use-case showing how the framework can be used to compare LIME, SHAP and MDNet, establishing their advantages and shortcomings. Finally, we discuss relevant trends in XAI as well as recommendations for future work, all from the standpoint of our framework. △ Less

Submitted 14 May, 2021; originally announced May 2021.

Journal ref: Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) Workshops

arXiv:2012.03144 [pdf]

New Perspectives to Reduce Stress through Digital Humor

Authors: Misnal Munir, Amaliyah, Moses Glorino Rumambo Pandin

Abstract: This study aimed to find new perspectives on the use of humor through digital media. A qualitative approach was used to conduct this study, where data were collected through a literature review. Stress is caused by the inability of a person to adapt between desires and reality. All forms of stress are basically caused by a lack of understanding of human's own limitations. Inability to fight limita… ▽ More This study aimed to find new perspectives on the use of humor through digital media. A qualitative approach was used to conduct this study, where data were collected through a literature review. Stress is caused by the inability of a person to adapt between desires and reality. All forms of stress are basically caused by a lack of understanding of human's own limitations. Inability to fight limitations that will cause frustration, conflict, anxiety, and guilt. Too much stress can threaten a person's ability to deal with the environment. As a result, employees develop various kinds of stress symptoms that can interfere with their work performance. Thus, the management of work stress is important to do, one of which uses humor. However, in the digital age, the spread of humor can be easily facilitated. The results of this review article find new perspectives to reduce stress through digital humor, namely interactive humor, funny photos, manipulations, phanimation, celebrity soundboards, and PowerPoint humor. The research shows that the use of humor as a coping strategy is able to predict positive affect and well-being work-related. Moreover, digital humor which has various forms as well as easy, fast, and wide spread, then the effect is felt increasingly significant △ Less

Submitted 5 December, 2020; originally announced December 2020.

arXiv:2008.10148 [pdf, other]

Drive Safe: Cognitive-Behavioral Mining for Intelligent Transportation Cyber-Physical System

Authors: Md. Shirajum Munir, Sarder Fakhrul Abedin, Ki Tae Kim, Do Hyeon Kim, Md. Golam Rabiul Alam, Choong Seon Hong

Abstract: This paper presents a cognitive behavioral-based driver mood repairment platform in intelligent transportation cyber-physical systems (IT-CPS) for road safety. In particular, we propose a driving safety platform for distracted drivers, namely \emph{drive safe}, in IT-CPS. The proposed platform recognizes the distracting activities of the drivers as well as their emotions for mood repair. Further,… ▽ More This paper presents a cognitive behavioral-based driver mood repairment platform in intelligent transportation cyber-physical systems (IT-CPS) for road safety. In particular, we propose a driving safety platform for distracted drivers, namely \emph{drive safe}, in IT-CPS. The proposed platform recognizes the distracting activities of the drivers as well as their emotions for mood repair. Further, we develop a prototype of the proposed drive safe platform to establish proof-of-concept (PoC) for the road safety in IT-CPS. In the developed driving safety platform, we employ five AI and statistical-based models to infer a vehicle driver's cognitive-behavioral mining to ensure safe driving during the drive. Especially, capsule network (CN), maximum likelihood (ML), convolutional neural network (CNN), Apriori algorithm, and Bayesian network (BN) are deployed for driver activity recognition, environmental feature extraction, mood recognition, sequential pattern mining, and content recommendation for affective mood repairment of the driver, respectively. Besides, we develop a communication module to interact with the systems in IT-CPS asynchronously. Thus, the developed drive safe PoC can guide the vehicle drivers when they are distracted from driving due to the cognitive-behavioral factors. Finally, we have performed a qualitative evaluation to measure the usability and effectiveness of the developed drive safe platform. We observe that the P-value is 0.0041 (i.e., < 0.05) in the ANOVA test. Moreover, the confidence interval analysis also shows significant gains in prevalence value which is around 0.93 for a 95% confidence level. The aforementioned statistical results indicate high reliability in terms of driver's safety and mental state. △ Less

Submitted 23 August, 2020; originally announced August 2020.

Comments: Submitted to IEEE Transactions on Intelligent Transportation Systems, Special Issue on Technologies for risk mitigation and support of impaired drivers

arXiv:2007.13305 [pdf, other]

doi 10.1109/ACCESS.2020.3040821

Controlling the Outbreak of COVID-19: A Noncooperative Game Perspective

Authors: Anupam Kumar Bairagi, Mehedi Masud, Do Hyeon Kim, Md. Shirajum Munir, Abdullah Al Nahid, Sarder Fakhrul Abedin, Kazi Masudul Alam, Sujit Biswas, Sultan S Alshamrani, Zhu Han, Choong Seon Hong

Abstract: COVID-19 is a global epidemic. Till now, there is no remedy for this epidemic. However, isolation and social distancing are seemed to be effective preventive measures to control this pandemic. Therefore, in this paper, an optimization problem is formulated that accommodates both isolation and social distancing features of the individuals. To promote social distancing, we solve the formulated probl… ▽ More COVID-19 is a global epidemic. Till now, there is no remedy for this epidemic. However, isolation and social distancing are seemed to be effective preventive measures to control this pandemic. Therefore, in this paper, an optimization problem is formulated that accommodates both isolation and social distancing features of the individuals. To promote social distancing, we solve the formulated problem by applying a noncooperative game that can provide an incentive for maintaining social distancing to prevent the spread of COVID-19. Furthermore, the sustainability of the lockdown policy is interpreted with the help of our proposed game-theoretic incentive model for maintaining social distancing where there exists a Nash equilibrium. Finally, we perform an extensive numerical analysis that shows the effectiveness of the proposed approach in terms of achieving the desired social-distancing to prevent the outbreak of the COVID-19 in a noncooperative environment. Numerical results show that the individual incentive increases more than 85% with an increasing percentage of home isolation from 25% to 100% for all considered scenarios. The numerical results also demonstrate that in a particular percentage of home isolation, the individual incentive decreases with an increasing number of individuals. △ Less

Submitted 26 November, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

Comments: Accepted article by IEEE Access. DOI: 10.1109/ACCESS.2020.3040821

arXiv:2005.09329 [pdf, other]

Localizing Firearm Carriers by Identifying Human-Object Pairs

Authors: Abdul Basit, Muhammad Akhtar Munir, Mohsen Ali, Arif Mahmood

Abstract: Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allow… ▽ More Visual identification of gunmen in a crowd is a challenging problem, that requires resolving the association of a person with an object (firearm). We present a novel approach to address this problem, by defining human-object interaction (and non-interaction) bounding boxes. In a given image, human and firearms are separately detected. Each detected human is paired with each detected firearm, allowing us to create a paired bounding box that contains both object and the human. A network is trained to classify these paired-bounding-boxes into human carrying the identified firearm or not. Extensive experiments were performed to evaluate effectiveness of the algorithm, including exploiting full pose of the human, hand key-points, and their association with the firearm. The knowledge of spatially localized features is key to success of our method by using multi-size proposals with adaptive average pooling. We have also extended a previously firearm detection dataset, by adding more images and tagging in extended dataset the human-firearm pairs (including bounding boxes for firearms and gunmen). The experimental results ($AP_{hold} = 78.5$) demonstrate effectiveness of the proposed method. △ Less

Submitted 20 May, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: 5 pages, accepted in IEEE ICIP 2020

arXiv:2003.04816 [pdf, other]

Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep Reinforcement Learning Approach

Authors: Sarder Fakhrul Abedin, Md. Shirajum Munir, Nguyen H. Tran, Zhu Han, Choong Seon Hong

Abstract: In this paper, we design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed to improve the data freshness and connectivity to the Internet of Things (IoT) devices. First, we formulate an energy-efficient trajectory optimization problem in which the objective is to maximize the energy efficiency by optimizing the UAV-BS trajectory policy.… ▽ More In this paper, we design a navigation policy for multiple unmanned aerial vehicles (UAVs) where mobile base stations (BSs) are deployed to improve the data freshness and connectivity to the Internet of Things (IoT) devices. First, we formulate an energy-efficient trajectory optimization problem in which the objective is to maximize the energy efficiency by optimizing the UAV-BS trajectory policy. We also incorporate different contextual information such as energy and age of information (AoI) constraints to ensure the data freshness at the ground BS. Second, we propose an agile deep reinforcement learning with experience replay model to solve the formulated problem concerning the contextual constraints for the UAV-BS navigation. Moreover, the proposed approach is well-suited for solving the problem, since the state space of the problem is extremely large and finding the best trajectory policy with useful contextual features is too complex for the UAV-BSs. By applying the proposed trained model, an effective real-time trajectory policy for the UAV-BSs captures the observable network states over time. Finally, the simulation results illustrate the proposed approach is 3.6% and 3.13% more energy efficient than those of the greedy and baseline deep Q Network (DQN) approaches. △ Less

Submitted 21 February, 2020; originally announced March 2020.

Comments: Submitted to IEEE Transactions on Intelligent Transportation Systems, Special Issue on Unmanned Aircraft System Traffic Management

arXiv:2003.04551 [pdf, other]

Coexistence Mechanism between eMBB and uRLLC in 5G Wireless Networks

Authors: Anupam Kumar Bairagi, Md. Shirajum Munir, Madyan Alsenwi, Nguyen H. Tran, Sultan S Alshamrani, Mehedi Masud, Zhu Han, Choong Seon Hong

Abstract: uRLLC and eMBB are two influential services of the emerging 5G cellular network. Latency and reliability are major concerns for uRLLC applications, whereas eMBB services claim for the maximum data rates. Owing to the trade-off among latency, reliability and spectral efficiency, sharing of radio resources between eMBB and uRLLC services, heads to a challenging scheduling dilemma. In this paper, we… ▽ More uRLLC and eMBB are two influential services of the emerging 5G cellular network. Latency and reliability are major concerns for uRLLC applications, whereas eMBB services claim for the maximum data rates. Owing to the trade-off among latency, reliability and spectral efficiency, sharing of radio resources between eMBB and uRLLC services, heads to a challenging scheduling dilemma. In this paper, we study the co-scheduling problem of eMBB and uRLLC traffic based upon the puncturing technique. Precisely, we formulate an optimization problem aiming to maximize the MEAR of eMBB UEs while fulfilling the provisions of the uRLLC traffic. We decompose the original problem into two sub-problems, namely scheduling problem of eMBB UEs and uRLLC UEs while prevailing objective unchanged. Radio resources are scheduled among the eMBB UEs on a time slot basis, whereas it is handled for uRLLC UEs on a mini-slot basis. Moreover, for resolving the scheduling issue of eMBB UEs, we use PSUM based algorithm, whereas the optimal TM is adopted for solving the same problem of uRLLC UEs. Furthermore, a heuristic algorithm is also provided to solve the first sub-problem with lower complexity. Finally, the significance of the proposed approach over other baseline approaches is established through numerical analysis in terms of the MEAR and fairness scores of the eMBB UEs. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: 30 pages, 11 figures, IEEE Transactions on Communications

arXiv:2003.02157 [pdf, other]

doi 10.1109/TNSM.2021.3049381

Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A Multi-Agent Deep Reinforcement Learning Approach

Authors: Md. Shirajum Munir, Sarder Fakhrul Abedin, Nguyen H. Tran, Zhu Han, Eui-Nam Huh, Choong Seon Hong

Abstract: In recent years, multi-access edge computing (MEC) is a key enabler for handling the massive expansion of Internet of Things (IoT) applications and services. However, energy consumption of a MEC network depends on volatile tasks that induces risk for energy demand estimations. As an energy supplier, a microgrid can facilitate seamless energy supply. However, the risk associated with energy supply… ▽ More In recent years, multi-access edge computing (MEC) is a key enabler for handling the massive expansion of Internet of Things (IoT) applications and services. However, energy consumption of a MEC network depends on volatile tasks that induces risk for energy demand estimations. As an energy supplier, a microgrid can facilitate seamless energy supply. However, the risk associated with energy supply is also increased due to unpredictable energy generation from renewable and non-renewable sources. Especially, the risk of energy shortfall is involved with uncertainties in both energy consumption and generation. In this paper, we study a risk-aware energy scheduling problem for a microgrid-powered MEC network. First, we formulate an optimization problem considering the conditional value-at-risk (CVaR) measurement for both energy consumption and generation, where the objective is to minimize the expected residual of scheduled energy for the MEC networks and we show this problem is an NP-hard problem. Second, we analyze our formulated problem using a multi-agent stochastic game that ensures the joint policy Nash equilibrium, and show the convergence of the proposed model. Third, we derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based asynchronous advantage actor-critic (A3C) algorithm with shared neural networks. This method mitigates the curse of dimensionality of the state space and chooses the best policy among the agents for the proposed problem. Finally, the experimental results establish a significant performance gain by considering CVaR for high accuracy energy scheduling of the proposed model than both the single and random agent models. △ Less

Submitted 5 January, 2021; v1 submitted 20 February, 2020; originally announced March 2020.

Comments: Accepted Article BY IEEE Transactions on Network and Service Management, DOI: 10.1109/TNSM.2021.3049381

arXiv:2002.08567 [pdf, other]

doi 10.1109/TNSM.2021.3057960

Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable Edge Computing Systems

Authors: Md. Shirajum Munir, Nguyen H. Tran, Walid Saad, Choong Seon Hong

Abstract: The stringent requirements of mobile edge computing (MEC) applications and functions fathom the high capacity and dense deployment of MEC hosts to the upcoming wireless networks. However, operating such high capacity MEC hosts can significantly increase energy consumption. Thus, a base station (BS) unit can act as a self-powered BS. In this paper, an effective energy dispatch mechanism for self-po… ▽ More The stringent requirements of mobile edge computing (MEC) applications and functions fathom the high capacity and dense deployment of MEC hosts to the upcoming wireless networks. However, operating such high capacity MEC hosts can significantly increase energy consumption. Thus, a base station (BS) unit can act as a self-powered BS. In this paper, an effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied. First, a two-stage linear stochastic programming problem is formulated with the goal of minimizing the total energy consumption cost of the system while fulfilling the energy demand. Second, a semi-distributed data-driven solution is proposed by developing a novel multi-agent meta-reinforcement learning (MAMRL) framework to solve the formulated problem. In particular, each BS plays the role of a local agent that explores a Markovian behavior for both energy consumption and generation while each BS transfers time-varying features to a meta-agent. Sequentially, the meta-agent optimizes (i.e., exploits) the energy dispatch decision by accepting only the observations from each local agent with its own state information. Meanwhile, each BS agent estimates its own energy dispatch policy by applying the learned parameters from meta-agent. Finally, the proposed MAMRL framework is benchmarked by analyzing deterministic, asymmetric, and stochastic environments in terms of non-renewable energy usages, energy cost, and accuracy. Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost (with 95.8% prediction accuracy), compared to other baseline methods. △ Less

Submitted 9 February, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

arXiv:1905.06175 [pdf, other]

doi 10.1007/978-3-030-30493-5_43

TSXplain: Demystification of DNN Decisions for Time-Series using Natural Language and Statistical Features

Authors: Mohsin Munir, Shoaib Ahmed Siddiqui, Ferdinand Küsters, Dominique Mercier, Andreas Dengel, Sheraz Ahmed

Abstract: Neural networks (NN) are considered as black-boxes due to the lack of explainability and transparency of their decisions. This significantly hampers their deployment in environments where explainability is essential along with the accuracy of the system. Recently, significant efforts have been made for the interpretability of these deep networks with the aim to open up the black-box. However, most… ▽ More Neural networks (NN) are considered as black-boxes due to the lack of explainability and transparency of their decisions. This significantly hampers their deployment in environments where explainability is essential along with the accuracy of the system. Recently, significant efforts have been made for the interpretability of these deep networks with the aim to open up the black-box. However, most of these approaches are specifically developed for visual modalities. In addition, the interpretations provided by these systems require expert knowledge and understanding for intelligibility. This indicates a vital gap between the explainability provided by the systems and the novice user. To bridge this gap, we present a novel framework i.e. Time-Series eXplanation (TSXplain) system which produces a natural language based explanation of the decision taken by a NN. It uses the extracted statistical features to describe the decision of a NN, merging the deep learning world with that of statistics. The two-level explanation provides ample description of the decision made by the network to aid an expert as well as a novice user alike. Our survey and reliability assessment test confirm that the generated explanations are meaningful and correct. We believe that generating natural language based descriptions of the network's decisions is a big step towards opening up the black-box. △ Less

Submitted 15 May, 2019; originally announced May 2019.

Comments: Pre-print

arXiv:1904.10032 [pdf, other]

Leveraging Orientation for Weakly Supervised Object Detection with Application to Firearm Localization

Authors: Javed Iqbal, Muhammad Akhtar Munir, Arif Mahmood, Afsheen Rafaqat Ali, Mohsen Ali

Abstract: Automatic detection of firearms is important for enhancing the security and safety of people, however, it is a challenging task owing to the wide variations in shape, size, and appearance of firearms. Also, most of the generic object detectors process axis-aligned rectangular areas though, a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrele… ▽ More Automatic detection of firearms is important for enhancing the security and safety of people, however, it is a challenging task owing to the wide variations in shape, size, and appearance of firearms. Also, most of the generic object detectors process axis-aligned rectangular areas though, a thin and long rifle may actually cover only a small percentage of that area and the rest may contain irrelevant details suppressing the required object signatures. To handle these challenges, we propose a weakly supervised Orientation Aware Object Detection (OAOD) algorithm which learns to detect oriented object bounding boxes (OBB) while using AxisAligned Bounding Boxes (AABB) for training. The proposed OAOD is different from the existing oriented object detectors which strictly require OBB during training which may not always be present. The goal of training on AABB and detection of OBB is achieved by employing a multistage scheme, with Stage-1 predicting the AABB and Stage-2 predicting OBB. In-between the two stages, the oriented proposal generation module along with the object aligned RoI pooling is designed to extract features based on the predicted orientation and to make these features orientation invariant. A diverse and challenging dataset consisting of eleven thousand images is also proposed for firearm detection which is manually annotated for firearm classification and localization. The proposed ITU Firearm dataset (ITUF) contains a wide range of guns and rifles. The OAOD algorithm is evaluated on the ITUF dataset and compared with current state-of-the-art object detectors, including fully supervised oriented object detectors. OAOD has outperformed both types of object detectors with a significant margin. The experimental results (mAP: 88.3 on AABB & mAP: 77.5 on OBB) demonstrate the effectiveness of the proposed algorithm for firearm detection. △ Less

Submitted 29 January, 2021; v1 submitted 22 April, 2019; originally announced April 2019.

Comments: Accepted for Publication in Neurocomputing

arXiv:1802.07906 [pdf]

doi 10.1007/s10035-018-0847-5

Stability of Granular Tunnel

Authors: Elfi Yuliza, Nadya Amalia, Handika Dany Rahmayanti, Rahmawati Munir, Muhammad Miftahul Munir, Khairurrijal Khairurrijal, Mikrajuddin Abdullah

Abstract: We demonstrated the stability of tunnels made of granular matters is strongly dependent on the grain size, tunnel diameter, and water content in the granules. Larger tunnel radius, larger grain size, and too much water content tend to destabilize the tunnel. We also develop a model to describe such findings. We identified a phase diagram of stability which greatly controlled by granular bond order… ▽ More We demonstrated the stability of tunnels made of granular matters is strongly dependent on the grain size, tunnel diameter, and water content in the granules. Larger tunnel radius, larger grain size, and too much water content tend to destabilize the tunnel. We also develop a model to describe such findings. We identified a phase diagram of stability which greatly controlled by granular bond order. For granular bond order of larger than unity, we can always made a stable tunnel. However, for granular bond order of less than unity, we obtain a general expression for maximum tunnel thickness that can be made. To best of our knowledge, this is the first exploration regarding the granular tunnel stability. △ Less

Submitted 22 February, 2018; originally announced February 2018.

Comments: 13 pages, 6 Figures, and 1 Table

arXiv:1802.02952 [pdf, other]

doi 10.1109/ACCESS.2019.2912823

TSViz: Demystification of Deep Learning Models for Time-Series Analysis

Authors: Shoaib Ahmed Siddiqui, Dominik Mercier, Mohsin Munir, Andreas Dengel, Sheraz Ahmed

Abstract: This paper presents a novel framework for demystification of convolutional deep learning models for time-series analysis. This is a step towards making informed/explainable decisions in the domain of time-series, powered by deep learning. There have been numerous efforts to increase the interpretability of image-centric deep neural network models, where the learned features are more intuitive to v… ▽ More This paper presents a novel framework for demystification of convolutional deep learning models for time-series analysis. This is a step towards making informed/explainable decisions in the domain of time-series, powered by deep learning. There have been numerous efforts to increase the interpretability of image-centric deep neural network models, where the learned features are more intuitive to visualize. Visualization in time-series domain is much more complicated as there is no direct interpretation of the filters and inputs as compared to the image modality. In addition, little or no concentration has been devoted for the development of such tools in the domain of time-series in the past. TSViz provides possibilities to explore and analyze a network from different dimensions at different levels of abstraction which includes identification of parts of the input that were responsible for a prediction (including per filter saliency), importance of different filters present in the network for a particular prediction, notion of diversity present in the network through filter clustering, understanding of the main sources of variation learnt by the network through inverse optimization, and analysis of the network's robustness against adversarial noise. As a sanity check for the computed influence values, we demonstrate results regarding pruning of neural networks based on the computed influence information. These representations allow to understand the network features so that the acceptability of deep networks for time-series data can be enhanced. This is extremely important in domains like finance, industry 4.0, self-driving cars, health-care, counter-terrorism etc., where reasons for reaching a particular prediction are equally important as the prediction itself. We assess the proposed framework for interpretability with a set of desirable properties essential for any method. △ Less

Submitted 5 May, 2020; v1 submitted 8 February, 2018; originally announced February 2018.

Comments: 7 Pages (6 + 1 for references), 7 figures

Report number: ACCESS.2019.2912823

Journal ref: IEEE Access 2019 PP(99):1-1

arXiv:1402.0247 [pdf]

Secure Debit Card Device Model

Authors: Ms. Rumaisah Munir

Abstract: The project envisages the implementation of an e-payment system utilizing FIPS-201 Smart Card. The system combines hardware and software modules. The hardware module takes data insertions (e.g. currency notes), processes the data and then creates connection with the smart card using serial/USB ports to perform further mathematical manipulations. The hardware interacts with servers at the back for… ▽ More The project envisages the implementation of an e-payment system utilizing FIPS-201 Smart Card. The system combines hardware and software modules. The hardware module takes data insertions (e.g. currency notes), processes the data and then creates connection with the smart card using serial/USB ports to perform further mathematical manipulations. The hardware interacts with servers at the back for authentication and identification of users and for data storage pertaining to a particular user. The software module manages database, handles identities, provide authentication and secure communication between the various system components. It will also provide a component to the end users. This component can be in the form of software for computer or executable binaries for PoS devices. The idea is to receive data in the embedded system from data reader and smart card. After manipulations, the updated data is imprinted on smart card memory and also updated in the back end servers maintaining database. The information to be sent to a server is sent through a PoS device which has multiple transfer mediums involving wired and un-wired mediums. The user device also acts as an updater; therefore, whenever the smart card is inserted by user, it is automatically updated by synchronizing with back-end database. The project required expertise in embedded systems, networks, java and C++ (Optional). △ Less

Submitted 2 February, 2014; originally announced February 2014.

Comments: Royal Institute of Technology, KTH

arXiv:0912.0169 [pdf, ps, other]

Classification of compact homogeneous spaces with invariant $G_2$-structures

Authors: Hong Van Le, Mobeen Munir

Abstract: In this note we classify all homogeneous spaces $G/H$ admitting a $G$-invariant $G_2$-structure, assuming that $G$ is a compact Lie group and $G$ acts effectively on $G/H$. They include a subclass of all homogeneous spaces $G/H$ with a $G$-invariant $\tilde G_2$-structure, where $G$ is a compact Lie group. There are many new examples with nontrivial fundamental group. We study a subclass of homoge… ▽ More In this note we classify all homogeneous spaces $G/H$ admitting a $G$-invariant $G_2$-structure, assuming that $G$ is a compact Lie group and $G$ acts effectively on $G/H$. They include a subclass of all homogeneous spaces $G/H$ with a $G$-invariant $\tilde G_2$-structure, where $G$ is a compact Lie group. There are many new examples with nontrivial fundamental group. We study a subclass of homogeneous spaces of high rigidity and low rigidity and show that they admit families of invariant coclosed $G_2$-structures (resp. $\tilde G_2$-structures). △ Less

Submitted 20 December, 2010; v1 submitted 1 December, 2009; originally announced December 2009.

Comments: final version, 24p

MSC Class: 57M50; 57M60

Journal ref: Advances in Geometry, 12(2012), 303-328

Showing 1–49 of 49 results for author: Munir, M