-
Quantum-Resistant Domain Name System: A Comprehensive System-Level Study
Authors:
Juyoul Lee,
Sanzida Hoque,
Abdullah Aydeger,
Engin Zeydan
Abstract:
The Domain Name System (DNS) plays a foundational role in Internet infrastructure, yet its core protocols remain vulnerable to compromise by quantum adversaries. As cryptographically relevant quantum computers become a realistic threat, ensuring DNS confidentiality, authenticity, and integrity in the post-quantum era is imperative. In this paper, we present a comprehensive system-level study of po…
▽ More
The Domain Name System (DNS) plays a foundational role in Internet infrastructure, yet its core protocols remain vulnerable to compromise by quantum adversaries. As cryptographically relevant quantum computers become a realistic threat, ensuring DNS confidentiality, authenticity, and integrity in the post-quantum era is imperative. In this paper, we present a comprehensive system-level study of post-quantum DNS security across three widely deployed mechanisms: DNSSEC, DNS-over-TLS (DoT), and DNS-over-HTTPS (DoH). We propose Post-Quantum Cryptographic (PQC)-DNS, a unified framework for benchmarking DNS security under legacy, post-quantum, and hybrid cryptographic configurations. Our implementation leverages the Open Quantum Safe (OQS) libraries and integrates lattice- and hash-based primitives into BIND9 and TLS 1.3 stacks. We formalize performance and threat models and analyze the impact of post-quantum key encapsulation and digital signatures on end-to-end DNS resolution. Experimental results on a containerized testbed reveal that lattice-based primitives such as Module-Lattice-Based Key-Encapsulation Mechanism (MLKEM) and Falcon offer practical latency and resource profiles, while hash-based schemes like SPHINCS+ significantly increase message sizes and processing overhead. We also examine security implications including downgrade risks, fragmentation vulnerabilities, and susceptibility to denial-of-service amplification. Our findings inform practical guidance for deploying quantum-resilient DNS and contribute to the broader effort of securing core Internet protocols for the post-quantum future.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
IGraSS: Learning to Identify Infrastructure Networks from Satellite Imagery by Iterative Graph-constrained Semantic Segmentation
Authors:
Oishee Bintey Hoque,
Abhijin Adiga,
Aniruddha Adiga,
Siddharth Chaudhary,
Madhav V. Marathe,
S. S. Ravi,
Kirti Rajagopalan,
Amanda Wilson,
Samarth Swarup
Abstract:
Accurate canal network mapping is essential for water management, including irrigation planning and infrastructure maintenance. State-of-the-art semantic segmentation models for infrastructure mapping, such as roads, rely on large, well-annotated remote sensing datasets. However, incomplete or inadequate ground truth can hinder these learning approaches. Many infrastructure networks have graph-lev…
▽ More
Accurate canal network mapping is essential for water management, including irrigation planning and infrastructure maintenance. State-of-the-art semantic segmentation models for infrastructure mapping, such as roads, rely on large, well-annotated remote sensing datasets. However, incomplete or inadequate ground truth can hinder these learning approaches. Many infrastructure networks have graph-level properties such as reachability to a source (like canals) or connectivity (roads) that can be leveraged to improve these existing ground truth. This paper develops a novel iterative framework IGraSS, combining a semantic segmentation module-incorporating RGB and additional modalities (NDWI, DEM)-with a graph-based ground-truth refinement module. The segmentation module processes satellite imagery patches, while the refinement module operates on the entire data viewing the infrastructure network as a graph. Experiments show that IGraSS reduces unreachable canal segments from around 18% to 3%, and training with refined ground truth significantly improves canal identification. IGraSS serves as a robust framework for both refining noisy ground truth and mapping canal networks from remote sensing imagery. We also demonstrate the effectiveness and generalizability of IGraSS using road networks as an example, applying a different graph-theoretic constraint to complete road networks.
△ Less
Submitted 10 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
Personalized Large Language Models Can Increase the Belief Accuracy of Social Networks
Authors:
Adiba Mahbub Proma,
Neeley Pate,
Sean Kelty,
Gourab Ghoshal,
James N. Druckman,
Ehsan Hoque
Abstract:
Large language models (LLMs) are increasingly involved in shaping public understanding on contested issues. This has led to substantial discussion about the potential of LLMs to reinforce or correct misperceptions. While existing literature documents the impact of LLMs on individuals' beliefs, limited work explores how LLMs affect social networks. We address this gap with a pre-registered experime…
▽ More
Large language models (LLMs) are increasingly involved in shaping public understanding on contested issues. This has led to substantial discussion about the potential of LLMs to reinforce or correct misperceptions. While existing literature documents the impact of LLMs on individuals' beliefs, limited work explores how LLMs affect social networks. We address this gap with a pre-registered experiment (N = 1265) around the 2024 US presidential election, where we empirically explore the impact of personalized LLMs on belief accuracy in the context of social networks. The LLMs are constructed to be personalized, offering messages tailored to individuals' profiles, and to have guardrails for accurate information retrieval. We find that the presence of a personalized LLM leads individuals to update their beliefs towards the truth. More importantly, individuals with a personalized LLM in their social network not only choose to follow it, indicating they would like to obtain information from it in subsequent interactions, but also construct subsequent social networks to include other individuals with beliefs similar to the LLM -- in this case, more accurate beliefs. Therefore, our results show that LLMs have the capacity to influence individual beliefs and the social networks in which people exist, and highlight the potential of LLMs to act as corrective agents in online environments. Our findings can inform future strategies for responsible AI-mediated communication.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Improving Automatic Evaluation of Large Language Models (LLMs) in Biomedical Relation Extraction via LLMs-as-the-Judge
Authors:
Md Tahmid Rahman Laskar,
Israt Jahan,
Elham Dolatabadi,
Chun Peng,
Enamul Hoque,
Jimmy Huang
Abstract:
Large Language Models (LLMs) have demonstrated impressive performance in biomedical relation extraction, even in zero-shot scenarios. However, evaluating LLMs in this task remains challenging due to their ability to generate human-like text, often producing synonyms or abbreviations of gold-standard answers, making traditional automatic evaluation metrics unreliable. On the other hand, while human…
▽ More
Large Language Models (LLMs) have demonstrated impressive performance in biomedical relation extraction, even in zero-shot scenarios. However, evaluating LLMs in this task remains challenging due to their ability to generate human-like text, often producing synonyms or abbreviations of gold-standard answers, making traditional automatic evaluation metrics unreliable. On the other hand, while human evaluation is more reliable, it is costly and time-consuming, making it impractical for real-world applications. This paper investigates the use of LLMs-as-the-Judge as an alternative evaluation method for biomedical relation extraction. We benchmark 8 LLMs as judges to evaluate the responses generated by 5 other LLMs across 3 biomedical relation extraction datasets. Unlike other text-generation tasks, we observe that LLM-based judges perform quite poorly (usually below 50% accuracy) in the biomedical relation extraction task. Our findings reveal that it happens mainly because relations extracted by LLMs do not adhere to any standard format. To address this, we propose structured output formatting for LLM-generated responses that helps LLM-Judges to improve their performance by about 15% (on average). We also introduce a domain adaptation technique to further enhance LLM-Judge performance by effectively transferring knowledge between datasets. We release both our human-annotated and LLM-annotated judgment data (36k samples in total) for public use here: https://github.com/tahmedge/llm_judge_biomedical_re.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
An Exploratory Approach Towards Investigating and Explaining Vision Transformer and Transfer Learning for Brain Disease Detection
Authors:
Shuvashis Sarker,
Shamim Rahim Refat,
Faika Fairuj Preotee,
Shifat Islam,
Tashreef Muhammad,
Mohammad Ashraful Hoque
Abstract:
The brain is a highly complex organ that manages many important tasks, including movement, memory and thinking. Brain-related conditions, like tumors and degenerative disorders, can be hard to diagnose and treat. Magnetic Resonance Imaging (MRI) serves as a key tool for identifying these conditions, offering high-resolution images of brain structures. Despite this, interpreting MRI scans can be co…
▽ More
The brain is a highly complex organ that manages many important tasks, including movement, memory and thinking. Brain-related conditions, like tumors and degenerative disorders, can be hard to diagnose and treat. Magnetic Resonance Imaging (MRI) serves as a key tool for identifying these conditions, offering high-resolution images of brain structures. Despite this, interpreting MRI scans can be complicated. This study tackles this challenge by conducting a comparative analysis of Vision Transformer (ViT) and Transfer Learning (TL) models such as VGG16, VGG19, Resnet50V2, MobilenetV2 for classifying brain diseases using MRI data from Bangladesh based dataset. ViT, known for their ability to capture global relationships in images, are particularly effective for medical imaging tasks. Transfer learning helps to mitigate data constraints by fine-tuning pre-trained models. Furthermore, Explainable AI (XAI) methods such as GradCAM, GradCAM++, LayerCAM, ScoreCAM, and Faster-ScoreCAM are employed to interpret model predictions. The results demonstrate that ViT surpasses transfer learning models, achieving a classification accuracy of 94.39%. The integration of XAI methods enhances model transparency, offering crucial insights to aid medical professionals in diagnosing brain diseases with greater precision.
△ Less
Submitted 22 June, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
Reading.help: Supporting EFL Readers with Proactive and On-Demand Explanation of English Grammar and Semantics
Authors:
Sunghyo Chung,
Hyeon Jeon,
Sungbok Shin,
Md Naimul Hoque
Abstract:
A large portion of texts in the world is written in English, but readers who see English as a Foreign Language (EFL) often struggle to read texts written in English accurately and swiftly. In many countries, EFL readers seek help from professional teachers and mentors, which is limited and costly. In this paper, we explore how an intelligent reading tool can assist EFL readers. To support our rese…
▽ More
A large portion of texts in the world is written in English, but readers who see English as a Foreign Language (EFL) often struggle to read texts written in English accurately and swiftly. In many countries, EFL readers seek help from professional teachers and mentors, which is limited and costly. In this paper, we explore how an intelligent reading tool can assist EFL readers. To support our research agenda, we conducted a case study with EFL readers in South Korea. We at first developed an LLM-based reading tool based on prior literature. We then revised the tool based on the feedback from a study with 15 South Korean EFL readers. The final tool, named Reading.help, helps EFL readers comprehend complex sentences and paragraphs with on-demand and proactive explanations. We finally evaluated the tool with 5 EFL readers and 2 EFL education professionals. Our findings suggest Reading.help could potentially help EFL readers self-learn english when they do not have access to any external support.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
Authors:
Ryan Hoque,
Peide Huang,
David J. Yoon,
Mouli Sivapurapu,
Jian Zhang
Abstract:
Imitation learning for manipulation has a well-known data scarcity problem. Unlike natural language and 2D computer vision, there is no Internet-scale corpus of data for dexterous manipulation. One appealing option is egocentric human video, a passively scalable data source. However, existing large-scale datasets such as Ego4D do not have native hand pose annotations and do not focus on object man…
▽ More
Imitation learning for manipulation has a well-known data scarcity problem. Unlike natural language and 2D computer vision, there is no Internet-scale corpus of data for dexterous manipulation. One appealing option is egocentric human video, a passively scalable data source. However, existing large-scale datasets such as Ego4D do not have native hand pose annotations and do not focus on object manipulation. To this end, we use Apple Vision Pro to collect EgoDex: the largest and most diverse dataset of dexterous human manipulation to date. EgoDex has 829 hours of egocentric video with paired 3D hand and finger tracking data collected at the time of recording, where multiple calibrated cameras and on-device SLAM can be used to precisely track the pose of every joint of each hand. The dataset covers a wide range of diverse manipulation behaviors with everyday household objects in 194 different tabletop tasks ranging from tying shoelaces to folding laundry. Furthermore, we train and systematically evaluate imitation learning policies for hand trajectory prediction on the dataset, introducing metrics and benchmarks for measuring progress in this increasingly important area. By releasing this large-scale dataset, we hope to push the frontier of robotics, computer vision, and foundation models.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
Authors:
Md Tahmid Rahman Laskar,
Mohammed Saidul Islam,
Ridwan Mahbub,
Ahmed Masry,
Mizanur Rahman,
Amran Bhuiyan,
Mir Tafseer Nayeem,
Shafiq Joty,
Enamul Hoque,
Jimmy Huang
Abstract:
Charts are ubiquitous as they help people understand and reason with data. Recently, various downstream tasks, such as chart question answering, chart2text, and fact-checking, have emerged. Large Vision-Language Models (LVLMs) show promise in tackling these tasks, but their evaluation is costly and time-consuming, limiting real-world deployment. While using LVLMs as judges to assess the chart comp…
▽ More
Charts are ubiquitous as they help people understand and reason with data. Recently, various downstream tasks, such as chart question answering, chart2text, and fact-checking, have emerged. Large Vision-Language Models (LVLMs) show promise in tackling these tasks, but their evaluation is costly and time-consuming, limiting real-world deployment. While using LVLMs as judges to assess the chart comprehension capabilities of other LVLMs could streamline evaluation processes, challenges like proprietary datasets, restricted access to powerful models, and evaluation costs hinder their adoption in industrial settings. To this end, we present a comprehensive evaluation of 13 open-source LVLMs as judges for diverse chart comprehension and reasoning tasks. We design both pairwise and pointwise evaluation tasks covering criteria like factual correctness, informativeness, and relevancy. Additionally, we analyze LVLM judges based on format adherence, positional consistency, length bias, and instruction-following. We focus on cost-effective LVLMs (<10B parameters) suitable for both research and commercial use, following a standardized evaluation protocol and rubric to measure the LVLM judge's accuracy. Experimental results reveal notable variability: while some open LVLM judges achieve GPT-4-level evaluation performance (about 80% agreement with GPT-4 judgments), others struggle (below ~10% agreement). Our findings highlight that state-of-the-art open-source LVLMs can serve as cost-effective automatic evaluators for chart-related tasks, though biases such as positional preference and length bias persist.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Knowledge-Informed Deep Learning for Irrigation Type Mapping from Remote Sensing
Authors:
Oishee Bintey Hoque,
Nibir Chandra Mandal,
Abhijin Adiga,
Samarth Swarup,
Sayjro Kossi Nouwakpo,
Amanda Wilson,
Madhav Marathe
Abstract:
Accurate mapping of irrigation methods is crucial for sustainable agricultural practices and food systems. However, existing models that rely solely on spectral features from satellite imagery are ineffective due to the complexity of agricultural landscapes and limited training data, making this a challenging problem. We present Knowledge-Informed Irrigation Mapping (KIIM), a novel Swin-Transforme…
▽ More
Accurate mapping of irrigation methods is crucial for sustainable agricultural practices and food systems. However, existing models that rely solely on spectral features from satellite imagery are ineffective due to the complexity of agricultural landscapes and limited training data, making this a challenging problem. We present Knowledge-Informed Irrigation Mapping (KIIM), a novel Swin-Transformer based approach that uses (i) a specialized projection matrix to encode crop to irrigation probability, (ii) a spatial attention map to identify agricultural lands from non-agricultural lands, (iii) bi-directional cross-attention to focus complementary information from different modalities, and (iv) a weighted ensemble for combining predictions from images and crop information. Our experimentation on five states in the US shows up to 22.9\% (IoU) improvement over baseline with a 71.4% (IoU) improvement for hard-to-classify drip irrigation. In addition, we propose a two-phase transfer learning approach to enhance cross-state irrigation mapping, achieving a 51% IoU boost in a state with limited labeled data. The ability to achieve baseline performance with only 40% of the training data highlights its efficiency, reducing the dependency on extensive manual labeling efforts and making large-scale, automated irrigation mapping more feasible and cost-effective.
△ Less
Submitted 5 June, 2025; v1 submitted 13 May, 2025;
originally announced May 2025.
-
IrrMap: A Large-Scale Comprehensive Dataset for Irrigation Method Mapping
Authors:
Nibir Chandra Mandal,
Oishee Bintey Hoque,
Abhijin Adiga,
Samarth Swarup,
Mandy Wilson,
Lu Feng,
Yangfeng Ji,
Miaomiao Zhang,
Geoffrey Fox,
Madhav Marathe
Abstract:
We introduce IrrMap, the first large-scale dataset (1.1 million patches) for irrigation method mapping across regions. IrrMap consists of multi-resolution satellite imagery from LandSat and Sentinel, along with key auxiliary data such as crop type, land use, and vegetation indices. The dataset spans 1,687,899 farms and 14,117,330 acres across multiple western U.S. states from 2013 to 2023, providi…
▽ More
We introduce IrrMap, the first large-scale dataset (1.1 million patches) for irrigation method mapping across regions. IrrMap consists of multi-resolution satellite imagery from LandSat and Sentinel, along with key auxiliary data such as crop type, land use, and vegetation indices. The dataset spans 1,687,899 farms and 14,117,330 acres across multiple western U.S. states from 2013 to 2023, providing a rich and diverse foundation for irrigation analysis and ensuring geospatial alignment and quality control. The dataset is ML-ready, with standardized 224x224 GeoTIFF patches, the multiple input modalities, carefully chosen train-test-split data, and accompanying dataloaders for seamless deep learning model training andbenchmarking in irrigation mapping. The dataset is also accompanied by a complete pipeline for dataset generation, enabling researchers to extend IrrMap to new regions for irrigation data collection or adapt it with minimal effort for other similar applications in agricultural and geospatial analysis. We also analyze the irrigation method distribution across crop groups, spatial irrigation patterns (using Shannon diversity indices), and irrigated area variations for both LandSat and Sentinel, providing insights into regional and resolution-based differences. To promote further exploration, we openly release IrrMap, along with the derived datasets, benchmark models, and pipeline code, through a GitHub repository: https://github.com/Nibir088/IrrMap and Data repository: https://huggingface.co/Nibir/IrrMap, providing comprehensive documentation and implementation details.
△ Less
Submitted 31 May, 2025; v1 submitted 13 May, 2025;
originally announced May 2025.
-
AI Standardized Patient Improves Human Conversations in Advanced Cancer Care
Authors:
Kurtis Haut,
Masum Hasan,
Thomas Carroll,
Ronald Epstein,
Taylan Sen,
Ehsan Hoque
Abstract:
Serious illness communication (SIC) in end-of-life care faces challenges such as emotional stress, cultural barriers, and balancing hope with honesty. Despite its importance, one of the few available ways for clinicians to practice SIC is with standardized patients, which is expensive, time-consuming, and inflexible. In this paper, we present SOPHIE, an AI-powered standardized patient simulation a…
▽ More
Serious illness communication (SIC) in end-of-life care faces challenges such as emotional stress, cultural barriers, and balancing hope with honesty. Despite its importance, one of the few available ways for clinicians to practice SIC is with standardized patients, which is expensive, time-consuming, and inflexible. In this paper, we present SOPHIE, an AI-powered standardized patient simulation and automated feedback system. SOPHIE combines large language models (LLMs), a lifelike virtual avatar, and automated, personalized feedback based on clinical literature to provide remote, on-demand SIC training. In a randomized control study with healthcare students and professionals, SOPHIE users demonstrated significant improvement across three critical SIC domains: Empathize, Be Explicit, and Empower. These results suggest that AI-driven tools can enhance complex interpersonal communication skills, offering scalable, accessible solutions to address a critical gap in clinician education.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models
Authors:
Hasan Md Tusfiqur Alam,
Devansh Srivastav,
Abdulrahman Mohamed Selim,
Md Abdul Kadir,
Md Moktadirul Hoque Shuvo,
Daniel Sonntag
Abstract:
Advancements in generative Artificial Intelligence (AI) hold great promise for automating radiology workflows, yet challenges in interpretability and reliability hinder clinical adoption. This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system to bridge AI performance with c…
▽ More
Advancements in generative Artificial Intelligence (AI) hold great promise for automating radiology workflows, yet challenges in interpretability and reliability hinder clinical adoption. This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system to bridge AI performance with clinical explainability. CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification. Meanwhile, the RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports. Our demonstration showcases the system's ability to deliver interpretable predictions, mitigate hallucinations, and generate high-quality, tailored reports with an interactive interface addressing accuracy, trust, and usability challenges. This framework provides a pathway to improving diagnostic consistency and empowering radiologists with actionable insights.
△ Less
Submitted 4 May, 2025; v1 submitted 29 April, 2025;
originally announced April 2025.
-
EPSILON: Adaptive Fault Mitigation in Approximate Deep Neural Network using Statistical Signatures
Authors:
Khurram Khalil,
Khaza Anuarul Hoque
Abstract:
The increasing adoption of approximate computing in deep neural network accelerators (AxDNNs) promises significant energy efficiency gains. However, permanent faults in AxDNNs can severely degrade their performance compared to their accurate counterparts (AccDNNs). Traditional fault detection and mitigation approaches, while effective for AccDNNs, introduce substantial overhead and latency, making…
▽ More
The increasing adoption of approximate computing in deep neural network accelerators (AxDNNs) promises significant energy efficiency gains. However, permanent faults in AxDNNs can severely degrade their performance compared to their accurate counterparts (AccDNNs). Traditional fault detection and mitigation approaches, while effective for AccDNNs, introduce substantial overhead and latency, making them impractical for energy-constrained real-time deployment. To address this, we introduce EPSILON, a lightweight framework that leverages pre-computed statistical signatures and layer-wise importance metrics for efficient fault detection and mitigation in AxDNNs. Our framework introduces a novel non-parametric pattern-matching algorithm that enables constant-time fault detection without interrupting normal execution while dynamically adapting to different network architectures and fault patterns. EPSILON maintains model accuracy by intelligently adjusting mitigation strategies based on a statistical analysis of weight distribution and layer criticality while preserving the energy benefits of approximate computing. Extensive evaluations across various approximate multipliers, AxDNN architectures, popular datasets (MNIST, CIFAR-10, CIFAR-100, ImageNet-1k), and fault scenarios demonstrate that EPSILON maintains 80.05\% accuracy while offering 22\% improvement in inference time and 28\% improvement in energy efficiency, establishing EPSILON as a practical solution for deploying reliable AxDNNs in safety-critical edge applications.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
ApproXAI: Energy-Efficient Hardware Acceleration of Explainable AI using Approximate Computing
Authors:
Ayesha Siddique,
Khurram Khalil,
Khaza Anuarul Hoque
Abstract:
Explainable artificial intelligence (XAI) enhances AI system transparency by framing interpretability as an optimization problem. However, this approach often necessitates numerous iterations of computationally intensive operations, limiting its applicability in real-time scenarios. While recent research has focused on XAI hardware acceleration on FPGAs and TPU, these methods do not fully address…
▽ More
Explainable artificial intelligence (XAI) enhances AI system transparency by framing interpretability as an optimization problem. However, this approach often necessitates numerous iterations of computationally intensive operations, limiting its applicability in real-time scenarios. While recent research has focused on XAI hardware acceleration on FPGAs and TPU, these methods do not fully address energy efficiency in real-time settings. To address this limitation, we propose XAIedge, a novel framework that leverages approximate computing techniques into XAI algorithms, including integrated gradients, model distillation, and Shapley analysis. XAIedge translates these algorithms into approximate matrix computations and exploits the synergy between convolution, Fourier transform, and approximate computing paradigms. This approach enables efficient hardware acceleration on TPU-based edge devices, facilitating faster real-time outcome interpretations. Our comprehensive evaluation demonstrates that XAIedge achieves a $2\times$ improvement in energy efficiency compared to existing accurate XAI hardware acceleration techniques while maintaining comparable accuracy. These results highlight the potential of XAIedge to significantly advance the deployment of explainable AI in energy-constrained real-time applications.
△ Less
Submitted 12 May, 2025; v1 submitted 24 April, 2025;
originally announced April 2025.
-
DashGuide: Authoring Interactive Dashboard Tours for Guiding Dashboard Users
Authors:
Naimul Hoque,
Nicole Sultanum
Abstract:
Dashboard guidance helps dashboard users better navigate interactive features, understand the underlying data, and assess insights they can potentially extract from dashboards. However, authoring dashboard guidance is a time consuming task, and embedding guidance into dashboards for effective delivery is difficult to realize. In this work, we contribute DashGuide, a framework and system to support…
▽ More
Dashboard guidance helps dashboard users better navigate interactive features, understand the underlying data, and assess insights they can potentially extract from dashboards. However, authoring dashboard guidance is a time consuming task, and embedding guidance into dashboards for effective delivery is difficult to realize. In this work, we contribute DashGuide, a framework and system to support the creation of interactive dashboard guidance with minimal authoring input. Given a dashboard and a communication goal, DashGuide captures a sequence of author-performed interactions to generate guidance materials delivered as playable step-by-step overlays, a.k.a., dashboard tours. Authors can further edit and refine individual tour steps while receiving generative assistance. We also contribute findings from a formative assessment with 9 dashboard creators, which helped inform the design of DashGuide; and findings from an evaluation of DashGuide with 12 dashboard creators, suggesting it provides an improved authoring experience that balances efficiency, expressiveness, and creative freedom.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Prognosis Of Lithium-Ion Battery Health with Hybrid EKF-CNN+LSTM Model Using Differential Capacity
Authors:
Md Azizul Hoque,
Babul Salam,
Mohd Khair Hassan,
Abdulkabir Aliyu,
Abedalmuhdi Almomany,
Muhammed Sutcu
Abstract:
Battery degradation is a major challenge in electric vehicles (EV) and energy storage systems (ESS). However, most degradation investigations focus mainly on estimating the state of charge (SOC), which fails to accurately interpret the cells' internal degradation mechanisms. Differential capacity analysis (DCA) focuses on the rate of change of cell voltage about the change in cell capacity, under…
▽ More
Battery degradation is a major challenge in electric vehicles (EV) and energy storage systems (ESS). However, most degradation investigations focus mainly on estimating the state of charge (SOC), which fails to accurately interpret the cells' internal degradation mechanisms. Differential capacity analysis (DCA) focuses on the rate of change of cell voltage about the change in cell capacity, under various charge/discharge rates. This paper developed a battery cell degradation testing model that used two types of lithium-ions (Li-ion) battery cells, namely lithium nickel cobalt aluminium oxides (LiNiCoAlO2) and lithium iron phosphate (LiFePO4), to evaluate internal degradation during loading conditions. The proposed battery degradation model contains distinct charge rates (DCR) of 0.2C, 0.5C, 1C, and 1.5C, as well as discharge rates (DDR) of 0.5C, 0.9C, 1.3C, and 1.6C to analyze the internal health and performance of battery cells during slow, moderate, and fast loading conditions. Besides, this research proposed a model that incorporates the Extended Kalman Filter (EKF), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM) networks to validate experimental data. The proposed model yields excellent modelling results based on mean squared error (MSE), and root mean squared error (RMSE), with errors of less than 0.001% at DCR and DDR. The peak identification technique (PIM) has been utilized to investigate battery health based on the number of peaks, peak position, peak height, peak area, and peak width. At last, the PIM method has discovered that the cell aged gradually under normal loading rates but deteriorated rapidly under fast loading conditions. Overall, LiFePO4 batteries perform more robustly and consistently than (LiNiCoAlO2) cells under varying loading conditions.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering
Authors:
Ahmed Masry,
Mohammed Saidul Islam,
Mahir Ahmed,
Aayush Bajaj,
Firoz Kabir,
Aaryaman Kartha,
Md Tahmid Rahman Laskar,
Mizanur Rahman,
Shadikur Rahman,
Mehrad Shahmohammadi,
Megh Thakkar,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty
Abstract:
Charts are ubiquitous, as people often use them to analyze data, answer questions, and discover critical insights. However, performing complex analytical tasks with charts requires significant perceptual and cognitive effort. Chart Question Answering (CQA) systems automate this process by enabling models to interpret and reason with visual representations of data. However, existing benchmarks like…
▽ More
Charts are ubiquitous, as people often use them to analyze data, answer questions, and discover critical insights. However, performing complex analytical tasks with charts requires significant perceptual and cognitive effort. Chart Question Answering (CQA) systems automate this process by enabling models to interpret and reason with visual representations of data. However, existing benchmarks like ChartQA lack real-world diversity and have recently shown performance saturation with modern large vision-language models (LVLMs). To address these limitations, we introduce ChartQAPro, a new benchmark that includes 1,341 charts from 157 diverse sources, spanning various chart types, including infographics and dashboards, and featuring 1,948 questions in various types, such as multiple-choice, conversational, hypothetical, and unanswerable questions, to better reflect real-world challenges. Our evaluations with 21 models show a substantial performance drop for LVLMs on ChartQAPro; e.g., Claude Sonnet 3.5 scores 90.5% on ChartQA but only 55.81% on ChartQAPro, underscoring the complexity of chart reasoning. We complement our findings with detailed error analyses and ablation studies, identifying key challenges and opportunities for advancing LVLMs in chart understanding and reasoning. We release ChartQAPro at https://github.com/vis-nlp/ChartQAPro.
△ Less
Submitted 10 April, 2025; v1 submitted 7 April, 2025;
originally announced April 2025.
-
A Theoretical Framework for Graph-based Digital Twins for Supply Chain Management and Optimization
Authors:
Azmine Toushik Wasi,
Mahfuz Ahmed Anik,
Abdur Rahman,
Md. Iqramul Hoque,
MD Shafikul Islam,
Md Manjurul Ahsan
Abstract:
Supply chain management is growing increasingly complex due to globalization, evolving market demands, and sustainability pressures, yet traditional systems struggle with fragmented data and limited analytical capabilities. Graph-based modeling offers a powerful way to capture the intricate relationships within supply chains, while Digital Twins (DTs) enable real-time monitoring and dynamic simula…
▽ More
Supply chain management is growing increasingly complex due to globalization, evolving market demands, and sustainability pressures, yet traditional systems struggle with fragmented data and limited analytical capabilities. Graph-based modeling offers a powerful way to capture the intricate relationships within supply chains, while Digital Twins (DTs) enable real-time monitoring and dynamic simulations. However, current implementations often face challenges related to scalability, data integration, and the lack of sustainability-focused metrics. To address these gaps, we propose a Graph-Based Digital Twin Framework for Supply Chain Optimization, which combines graph modeling with DT architecture to create a dynamic, real-time representation of supply networks. Our framework integrates a Data Integration Layer to harmonize disparate sources, a Graph Construction Module to model complex dependencies, and a Simulation and Analysis Engine for scalable optimization. Importantly, we embed sustainability metrics - such as carbon footprints and resource utilization - into operational dashboards to drive eco-efficiency. By leveraging the synergy between graph-based modeling and DTs, our approach enhances scalability, improves decision-making, and enables organizations to proactively manage disruptions, cut costs, and transition toward greener, more resilient supply chains.
△ Less
Submitted 23 March, 2025;
originally announced April 2025.
-
Characterizing Creativity in Visualization Design
Authors:
Naimul Hoque,
Zinat Ara,
Safwat Ali Khan,
Fanny Chevalier,
Niklas Elmqvist
Abstract:
Understanding the role of creativity in visualization design becomes increasingly important as the field matures, particularly with the emergence of various visualization authoring and recommendation systems. In this paper, we examine how creativity manifests in visualization design processes and how academic research has conceptualized it over time. Through a systematic review of 58 visualization…
▽ More
Understanding the role of creativity in visualization design becomes increasingly important as the field matures, particularly with the emergence of various visualization authoring and recommendation systems. In this paper, we examine how creativity manifests in visualization design processes and how academic research has conceptualized it over time. Through a systematic review of 58 visualization papers that use the terms "creativity" or "creative," we analyze the evolution of creative practices in visualization design. Our findings show that prior literature predominantly used atypical designs through free-form drawings, infographics, pictorials, and data comics to define creative representations. However, creativity in visualization design extends beyond visual representations to encompass early needfinding design activities such as sketching, storyboarding, discussion, and card sorting. Data visualization can also support a wide variety of creative tasks (e.g., fiction writing). We discuss the implications of these findings for fostering innovation within established design paradigms and for developing more sophisticated visualization authoring systems. The full list of coded papers are available here: https://vizcreativity.notion.site/coded-papers.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Explainable AI-Guided Efficient Approximate DNN Generation for Multi-Pod Systolic Arrays
Authors:
Ayesha Siddique,
Khurram Khalil,
Khaza Anuarul Hoque
Abstract:
Approximate deep neural networks (AxDNNs) are promising for enhancing energy efficiency in real-world devices. One of the key contributors behind this enhanced energy efficiency in AxDNNs is the use of approximate multipliers. Unfortunately, the simulation of approximate multipliers does not usually scale well on CPUs and GPUs. As a consequence, this slows down the overall simulation of AxDNNs aim…
▽ More
Approximate deep neural networks (AxDNNs) are promising for enhancing energy efficiency in real-world devices. One of the key contributors behind this enhanced energy efficiency in AxDNNs is the use of approximate multipliers. Unfortunately, the simulation of approximate multipliers does not usually scale well on CPUs and GPUs. As a consequence, this slows down the overall simulation of AxDNNs aimed at identifying the appropriate approximate multipliers to achieve high energy efficiency with a minimum accuracy loss. To address this problem, we present a novel XAI-Gen methodology, which leverages the analytical model of the emerging hardware accelerator (e.g., Google TPU v4) and explainable artificial intelligence (XAI) to precisely identify the non-critical layers for approximation and quickly discover the appropriate approximate multipliers for AxDNN layers. Our results show that XAI-Gen achieves up to 7x lower energy consumption with only 1-2% accuracy loss. We also showcase the effectiveness of the XAI-Gen approach through a neural architecture search (XAI-NAS) case study. Interestingly, XAI-NAS achieves 40\% higher energy efficiency with up to 5x less execution time when compared to the state-of-the-art NAS methods for generating AxDNNs.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Active management of battery degradation in wireless sensor network using deep reinforcement learning for group battery replacement
Authors:
Jong-Hyun Jeong,
Hongki Jo,
Qiang Zhou,
Tahsin Afroz Hoque Nishat,
Lang Wu
Abstract:
Wireless sensor networks (WSNs) have become a promising solution for structural health monitoring (SHM), especially in hard-to-reach or remote locations. Battery-powered WSNs offer various advantages over wired systems, however limited battery life has always been one of the biggest obstacles in practical use of the WSNs, regardless of energy harvesting methods. While various methods have been stu…
▽ More
Wireless sensor networks (WSNs) have become a promising solution for structural health monitoring (SHM), especially in hard-to-reach or remote locations. Battery-powered WSNs offer various advantages over wired systems, however limited battery life has always been one of the biggest obstacles in practical use of the WSNs, regardless of energy harvesting methods. While various methods have been studied for battery health management, existing methods exclusively aim to extend lifetime of individual batteries, lacking a system level view. A consequence of applying such methods is that batteries in a WSN tend to fail at different times, posing significant difficulty on planning and scheduling of battery replacement trip. This study investigate a deep reinforcement learning (DRL) method for active battery degradation management by optimizing duty cycle of WSNs at the system level. This active management strategy effectively reduces earlier failure of battery individuals which enable group replacement without sacrificing WSN performances. A simulated environment based on a real-world WSN setup was developed to train a DRL agent and learn optimal duty cycle strategies. The performance of the strategy was validated in a long-term setup with various network sizes, demonstrating its efficiency and scalability.
△ Less
Submitted 22 March, 2025; v1 submitted 20 March, 2025;
originally announced March 2025.
-
Humanoid Policy ~ Human Policy
Authors:
Ri-Zhao Qiu,
Shiqi Yang,
Xuxin Cheng,
Chaitanya Chawla,
Jialong Li,
Tairan He,
Ge Yan,
David J. Yoon,
Ryan Hoque,
Lars Paulsen,
Ge Yang,
Jian Zhang,
Sha Yi,
Guanya Shi,
Xiaolong Wang
Abstract:
Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embo…
▽ More
Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot learning. We mitigate the embodiment gap between humanoids and humans from both the data and modeling perspectives. We collect an egocentric task-oriented dataset (PH2D) that is directly aligned with humanoid manipulation demonstrations. We then train a human-humanoid behavior policy, which we term Human Action Transformer (HAT). The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions. Co-trained with smaller-scale robot data, HAT directly models humanoid robots and humans as different embodiments without additional supervision. We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency. Code and data: https://human-as-robot.github.io/
△ Less
Submitted 24 March, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
Securing Virtual Reality Experiences: Unveiling and Tackling Cybersickness Attacks with Explainable AI
Authors:
Ripan Kumar Kundu,
Matthew Denton,
Genova Mongalo,
Prasad Calyam,
Khaza Anuarul Hoque
Abstract:
The synergy between virtual reality (VR) and artificial intelligence (AI), specifically deep learning (DL)-based cybersickness detection models, has ushered in unprecedented advancements in immersive experiences by automatically detecting cybersickness severity and adaptively various mitigation techniques, offering a smooth and comfortable VR experience. While this DL-enabled cybersickness detecti…
▽ More
The synergy between virtual reality (VR) and artificial intelligence (AI), specifically deep learning (DL)-based cybersickness detection models, has ushered in unprecedented advancements in immersive experiences by automatically detecting cybersickness severity and adaptively various mitigation techniques, offering a smooth and comfortable VR experience. While this DL-enabled cybersickness detection method provides promising solutions for enhancing user experiences, it also introduces new risks since these models are vulnerable to adversarial attacks; a small perturbation of the input data that is visually undetectable to human observers can fool the cybersickness detection model and trigger unexpected mitigation, thus disrupting user immersive experiences (UIX) and even posing safety risks. In this paper, we present a new type of VR attack, i.e., a cybersickness attack, which successfully stops the triggering of cybersickness mitigation by fooling DL-based cybersickness detection models and dramatically hinders the UIX. Next, we propose a novel explainable artificial intelligence (XAI)-guided cybersickness attack detection framework to detect such attacks in VR to ensure UIX and a comfortable VR experience. We evaluate the proposed attack and the detection framework using two state-of-the-art open-source VR cybersickness datasets: Simulation 2021 and Gameplay dataset. Finally, to verify the effectiveness of our proposed method, we implement the attack and the XAI-based detection using a testbed with a custom-built VR roller coaster simulation with an HTC Vive Pro Eye headset and perform a user study. Our study shows that such an attack can dramatically hinder the UIX. However, our proposed XAI-guided cybersickness attack detection can successfully detect cybersickness attacks and trigger the proper mitigation, effectively reducing VR cybersickness.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
An Empirical Analysis of LLMs for Countering Misinformation
Authors:
Adiba Mahbub Proma,
Neeley Pate,
James Druckman,
Gourab Ghoshal,
Hangfeng He,
Ehsan Hoque
Abstract:
While Large Language Models (LLMs) can amplify online misinformation, they also show promise in tackling misinformation. In this paper, we empirically study the capabilities of three LLMs -- ChatGPT, Gemini, and Claude -- in countering political misinformation. We implement a two-step, chain-of-thought prompting approach, where models first identify credible sources for a given claim and then gene…
▽ More
While Large Language Models (LLMs) can amplify online misinformation, they also show promise in tackling misinformation. In this paper, we empirically study the capabilities of three LLMs -- ChatGPT, Gemini, and Claude -- in countering political misinformation. We implement a two-step, chain-of-thought prompting approach, where models first identify credible sources for a given claim and then generate persuasive responses. Our findings suggest that models struggle to ground their responses in real news sources, and tend to prefer citing left-leaning sources. We also observe varying degrees of response diversity among models. Our findings highlight concerns about using LLMs for fact-checking through only prompt-engineering, emphasizing the need for more robust guardrails. Our results have implications for both researchers and non-technical users.
△ Less
Submitted 28 February, 2025;
originally announced March 2025.
-
SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations
Authors:
Md Saidul Hoque Anik,
Ariful Azad
Abstract:
Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient computation of embedding is one of the dominant functions in the translation-based KG embedding training loop. We address this issue by replacing the core embeddin…
▽ More
Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient computation of embedding is one of the dominant functions in the translation-based KG embedding training loop. We address this issue by replacing the core embedding computation with SpMM (Sparse-Dense Matrix Multiplication) kernels. This allows us to unify multiple scatter (and gather) operations as a single operation, reducing training time and memory usage. We create a general framework for training KG models using sparse kernels and implement four models, namely TransE, TransR, TransH, and TorusE. Our sparse implementations exhibit up to 5.3x speedup on the CPU and up to 4.2x speedup on the GPU with a significantly low GPU memory footprint. The speedups are consistent across large and small datasets for a given model. Our proposed sparse approach can be extended to accelerate other translation-based (such as TransC, TransM, etc.) and non-translational (such as DistMult, ComplEx, RotatE, etc.) models as well. An implementation of the SpTransX framework is publicly available as a Python package in https://github.com/HipGraph/SpTransX.
△ Less
Submitted 30 April, 2025; v1 submitted 24 February, 2025;
originally announced February 2025.
-
Design and Implementation of a Scalable Clinical Data Warehouse for Resource-Constrained Healthcare Systems
Authors:
Shovito Barua Soumma,
Fahim Shahriar,
Umme Niraj Mahi,
Md Hasin Abrar,
Md Abdur Rahman Fahad,
Abu Sayed Md. Latiful Hoque
Abstract:
Centralized electronic health record repositories are critical for advancing disease surveillance, public health research, and evidence-based policymaking. However, developing countries face persistent challenges in achieving this due to fragmented healthcare data sources, inconsistent record-keeping practices, and the absence of standardized patient identifiers, limiting reliable record linkage,…
▽ More
Centralized electronic health record repositories are critical for advancing disease surveillance, public health research, and evidence-based policymaking. However, developing countries face persistent challenges in achieving this due to fragmented healthcare data sources, inconsistent record-keeping practices, and the absence of standardized patient identifiers, limiting reliable record linkage, compromise data interoperability, and limit scalability-obstacles exacerbated by infrastructural constraints and privacy concerns. To address these barriers, this study proposes a scalable, privacy-preserving clinical data warehouse, NCDW, designed for heterogeneous EHR integration in resource-limited settings and tested with 1.16 million clinical records. The framework incorporates a wrapper-based data acquisition layer for secure, automated ingestion of multisource health data and introduces a soundex algorithm to resolve patient identity mismatches in the absence of unique IDs. A modular data mart is designed for disease-specific analytics, demonstrated through a dengue fever case study in Bangladesh, integrating clinical, demographic, and environmental data for outbreak prediction and resource planning. Quantitative assessment of the data mart underscores its utility in strengthening national decision-support systems, highlighting the model's adaptability for infectious disease management. Comparative evaluation of database technologies reveals NoSQL outperforms relational SQL by 40-69% in complex query processing, while system load estimates validate the architecture's capacity to manage 19 million daily records (34TB over 5 years). The framework can be adapted to various healthcare settings across developing nations by modifying the ingestion layer to accommodate standards like ICD-11 and HL7 FHIR, facilitating interoperability for managing infectious diseases (i.e., COVID, tuberculosis).
△ Less
Submitted 2 May, 2025; v1 submitted 23 February, 2025;
originally announced February 2025.
-
ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition
Authors:
Bidyarthi Paul,
Faika Fairuj Preotee,
Shuvashis Sarker,
Shamim Rahim Refat,
Shifat Islam,
Tashreef Muhammad,
Mohammad Ashraful Hoque,
Shahriar Manzoor
Abstract:
Named Entity Recognition (NER) in regional dialects is a critical yet underexplored area in Natural Language Processing (NLP), especially for low-resource languages like Bangla. While NER systems for Standard Bangla have made progress, no existing resources or models specifically address the challenge of regional dialects such as Barishal, Chittagong, Mymensingh, Noakhali, and Sylhet, which exhibi…
▽ More
Named Entity Recognition (NER) in regional dialects is a critical yet underexplored area in Natural Language Processing (NLP), especially for low-resource languages like Bangla. While NER systems for Standard Bangla have made progress, no existing resources or models specifically address the challenge of regional dialects such as Barishal, Chittagong, Mymensingh, Noakhali, and Sylhet, which exhibit unique linguistic features that existing models fail to handle effectively. To fill this gap, we introduce ANCHOLIK-NER, the first benchmark dataset for NER in Bangla regional dialects, comprising 17,405 sentences distributed across five regions. The dataset was sourced from publicly available resources and supplemented with manual translations, ensuring alignment of named entities across dialects. We evaluate three transformer-based models - Bangla BERT, Bangla BERT Base, and BERT Base Multilingual Cased - on this dataset. Our findings demonstrate that BERT Base Multilingual Cased performs best in recognizing named entities across regions, with significant performance observed in Mymensingh with an F1-score of 82.611%. Despite strong overall performance, challenges remain in region like Chittagong, where the models show lower precision and recall. Since no previous NER systems for Bangla regional dialects exist, our work represents a foundational step in addressing this gap. Future work will focus on improving model performance in underperforming regions and expanding the dataset to include more dialects, enhancing the development of dialect-aware NER systems.
△ Less
Submitted 27 May, 2025; v1 submitted 16 February, 2025;
originally announced February 2025.
-
Analysis of Robust and Secure DNS Protocols for IoT Devices
Authors:
Abdullah Aydeger,
Sanzida Hoque,
Engin Zeydan,
Kapal Dev
Abstract:
The DNS (Domain Name System) protocol has been in use since the early days of the Internet. Although DNS as a de facto networking protocol had no security considerations in its early years, there have been many security enhancements, such as DNSSec (Domain Name System Security Extensions), DoT (DNS over Transport Layer Security), DoH (DNS over HTTPS) and DoQ (DNS over QUIC). With all these securit…
▽ More
The DNS (Domain Name System) protocol has been in use since the early days of the Internet. Although DNS as a de facto networking protocol had no security considerations in its early years, there have been many security enhancements, such as DNSSec (Domain Name System Security Extensions), DoT (DNS over Transport Layer Security), DoH (DNS over HTTPS) and DoQ (DNS over QUIC). With all these security improvements, it is not yet clear what resource-constrained Internet-of-Things (IoT) devices should be used for robustness. In this paper, we investigate different DNS security approaches using an edge DNS resolver implemented as a Virtual Network Function (VNF) to replicate the impact of the protocol from an IoT perspective and compare their performances under different conditions. We present our results for cache-based and non-cached responses and evaluate the corresponding security benefits. Our results and framework can greatly help consumers, manufacturers, and the research community decide and implement their DNS protocols depending on the given dynamic network conditions and enable robust Internet access via DNS for different devices.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Authors:
Ahmed Masry,
Juan A. Rodriguez,
Tianyu Zhang,
Suyuchen Wang,
Chao Wang,
Aarash Feizi,
Akshay Kalkunte Suresh,
Abhay Puri,
Xiangru Jian,
Pierre-André Noël,
Sathwik Tejaswi Madhusudhan,
Marco Pedersoli,
Bang Liu,
Nicolas Chapados,
Yoshua Bengio,
Enamul Hoque,
Christopher Pal,
Issam H. Laradji,
David Vazquez,
Perouz Taslakian,
Spandana Gella,
Sai Rajeswar
Abstract:
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), often produce out-of-distribution or…
▽ More
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), often produce out-of-distribution or noisy inputs, leading to misalignment between the modalities. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where scanned document images must be accurately mapped to their textual content. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods. We provide further analysis demonstrating improved vision-text feature alignment and robustness to noise.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection
Authors:
MD Sadik Hossain Shanto,
Mahir Labib Dihan,
Souvik Ghosh,
Riad Ahmed Anonto,
Hafijul Hoque Chowdhury,
Abir Muhtasim,
Rakib Ahsan,
MD Tanvir Hassan,
MD Roqunuzzaman Sojib,
Sheikh Azizul Hakim,
M. Saifur Rahman
Abstract:
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup), focusing on detecting deepfakes across diverse datasets. Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation. These models were specifically chosen for their complementary streng…
▽ More
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup), focusing on detecting deepfakes across diverse datasets. Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation. These models were specifically chosen for their complementary strengths. Integration of convolution layers and strided attention in MaxViT is well-suited for detecting local features. In contrast, hybrid use of convolution and attention mechanisms in CoAtNet effectively captures multi-scale features. Robust pretraining with masked image modeling of EVA-02 excels at capturing global features. After training, we freeze the parameters of these models and train the classification heads. Finally, a majority voting ensemble is employed to combine the predictions from these models, improving robustness and generalization to unseen scenarios. The proposed system addresses the challenges of detecting deepfakes in real-world conditions and achieves a commendable accuracy of 95.83% on the validation dataset.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Securing Wi-Fi 6 Connection Establishment Against Relay and Spoofing Threats
Authors:
Naureen Hoque,
Hanif Rahbari
Abstract:
Wireless local area networks remain vulnerable to attacks initiated during the connection establishment (CE) phase. Current Wi-Fi security protocols fail to fully mitigate attacks like man-in-the-middle, preamble spoofing, and relaying. To fortify the CE phase, in this paper we design a backward-compatible scheme using a digital signature interwoven into the preambles at the physical (PHY) layer w…
▽ More
Wireless local area networks remain vulnerable to attacks initiated during the connection establishment (CE) phase. Current Wi-Fi security protocols fail to fully mitigate attacks like man-in-the-middle, preamble spoofing, and relaying. To fortify the CE phase, in this paper we design a backward-compatible scheme using a digital signature interwoven into the preambles at the physical (PHY) layer with time constraints to effectively counter those attacks. This approach slices a MAC-layer signature and embeds the slices within CE frame preambles without extending frame size, allowing one or multiple stations to concurrently verify their respective APs' transmissions. The concurrent CEs are supported by enabling the stations to analyze the consistent patterns of PHY-layer headers and identify whether the received frames are the anticipated ones from the expected APs, achieving 100% accuracy without needing to examine their MAC-layer headers. Additionally, we design and implement a fast relay attack to challenge our proposed defense and determine its effectiveness. We extend existing open-source tools to support IEEE 802.11ax to evaluate the effectiveness and practicality of our proposed scheme in a testbed consisting of USRPs, commercial APs, and Wi-Fi devices, and we show that our relay attack detection achieves 96-100% true positive rates. Finally, end-to-end formal security analyses confirm the security and correctness of the proposed solution.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
ARMADA: Augmented Reality for Robot Manipulation and Robot-Free Data Acquisition
Authors:
Nataliya Nechyporenko,
Ryan Hoque,
Christopher Webb,
Mouli Sivapurapu,
Jian Zhang
Abstract:
Teleoperation for robot imitation learning is bottlenecked by hardware availability. Can high-quality robot data be collected without a physical robot? We present a system for augmenting Apple Vision Pro with real-time virtual robot feedback. By providing users with an intuitive understanding of how their actions translate to robot motions, we enable the collection of natural barehanded human data…
▽ More
Teleoperation for robot imitation learning is bottlenecked by hardware availability. Can high-quality robot data be collected without a physical robot? We present a system for augmenting Apple Vision Pro with real-time virtual robot feedback. By providing users with an intuitive understanding of how their actions translate to robot motions, we enable the collection of natural barehanded human data that is compatible with the limitations of physical robot hardware. We conducted a user study with 15 participants demonstrating 3 different tasks each under 3 different feedback conditions and directly replayed the collected trajectories on physical robot hardware. Results suggest live robot feedback dramatically improves the quality of the collected data, suggesting a new avenue for scalable human data collection without access to robot hardware. Videos and more are available at https://nataliya.dev/armada.
△ Less
Submitted 13 December, 2024;
originally announced December 2024.
-
HadaCore: Tensor Core Accelerated Hadamard Transform Kernel
Authors:
Krish Agarwal,
Rishi Astra,
Adnan Hoque,
Mudhakar Srivatsa,
Raghu Ganti,
Less Wright,
Sijia Chen
Abstract:
We present HadaCore, a modified Fast Walsh-Hadamard Transform (FWHT) algorithm optimized for the Tensor Cores present in modern GPU hardware. HadaCore follows the recursive structure of the original FWHT algorithm, achieving the same asymptotic runtime complexity but leveraging a hardware-aware work decomposition that benefits from Tensor Core acceleration. This reduces bottlenecks from compute an…
▽ More
We present HadaCore, a modified Fast Walsh-Hadamard Transform (FWHT) algorithm optimized for the Tensor Cores present in modern GPU hardware. HadaCore follows the recursive structure of the original FWHT algorithm, achieving the same asymptotic runtime complexity but leveraging a hardware-aware work decomposition that benefits from Tensor Core acceleration. This reduces bottlenecks from compute and data exchange. On Nvidia A100 and H100 GPUs, HadaCore achieves speedups of 1.1-1.4x and 1.0-1.3x, with a peak gain of 3.5x and 3.6x respectively, when compared to the existing state-of-the-art implementation of the original algorithm. We also show that when using FP16 or BF16, our implementation is numerically accurate, enabling comparable accuracy on MMLU benchmarks when used in an end-to-end Llama3 inference run with quantized (FP8) attention.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Accurate Water Level Monitoring in AWD Rice Cultivation Using Convolutional Neural Networks
Authors:
Ahmed Rafi Hasan,
Niloy Kumar Kundu,
Saad Hasan,
Mohammad Rashedul Hoque,
Swakkhar Shatabda
Abstract:
The Alternate Wetting and Drying (AWD) method is a rice-growing water management technique promoted as a sustainable alternative to Continuous Flooding (CF). Climate change has placed the agricultural sector in a challenging position, particularly as global water resources become increasingly scarce, affecting rice production on irrigated lowlands. Rice, a staple food for over half of the world's…
▽ More
The Alternate Wetting and Drying (AWD) method is a rice-growing water management technique promoted as a sustainable alternative to Continuous Flooding (CF). Climate change has placed the agricultural sector in a challenging position, particularly as global water resources become increasingly scarce, affecting rice production on irrigated lowlands. Rice, a staple food for over half of the world's population, demands significantly more water than other major crops. In Bangladesh, Boro rice, in particular, requires considerable water inputs during its cultivation. Traditionally, farmers manually measure water levels, a process that is both time-consuming and prone to errors. While ultrasonic sensors offer improvements in water height measurement, they still face limitations, such as susceptibility to weather conditions and environmental factors. To address these issues, we propose a novel approach that automates water height measurement using computer vision, specifically through a convolutional neural network (CNN). Our attention-based architecture achieved an $R^2$ score of 0.9885 and a Mean Squared Error (MSE) of 0.2766, providing a more accurate and efficient solution for managing AWD systems.
△ Less
Submitted 12 December, 2024; v1 submitted 11 December, 2024;
originally announced December 2024.
-
Zephyr quantum-assisted hierarchical Calo4pQVAE for particle-calorimeter interactions
Authors:
Ian Lu,
Hao Jia,
Sebastian Gonzalez,
Deniz Sogutlu,
J. Quetzalcoatl Toledo-Marin,
Sehmimul Hoque,
Abhishek Abhishek,
Colin Gay,
Roger Melko,
Eric Paquet,
Geoffrey Fox,
Maximilian Swiatlowski,
Wojciech Fedorko
Abstract:
With the approach of the High Luminosity Large Hadron Collider (HL-LHC) era set to begin particle collisions by the end of this decade, it is evident that the computational demands of traditional collision simulation methods are becoming increasingly unsustainable. Existing approaches, which rely heavily on first-principles Monte Carlo simulations for modeling event showers in calorimeters, are pr…
▽ More
With the approach of the High Luminosity Large Hadron Collider (HL-LHC) era set to begin particle collisions by the end of this decade, it is evident that the computational demands of traditional collision simulation methods are becoming increasingly unsustainable. Existing approaches, which rely heavily on first-principles Monte Carlo simulations for modeling event showers in calorimeters, are projected to require millions of CPU-years annually -- far exceeding current computational capacities. This bottleneck presents an exciting opportunity for advancements in computational physics by integrating deep generative models with quantum simulations. We propose a quantum-assisted hierarchical deep generative surrogate founded on a variational autoencoder (VAE) in combination with an energy conditioned restricted Boltzmann machine (RBM) embedded in the model's latent space as a prior. By mapping the topology of D-Wave's Zephyr quantum annealer (QA) into the nodes and couplings of a 4-partite RBM, we leverage quantum simulation to accelerate our shower generation times significantly. To evaluate our framework, we use Dataset 2 of the CaloChallenge 2022. Through the integration of classical computation and quantum simulation, this hybrid framework paves way for utilizing large-scale quantum simulations as priors in deep generative models.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Machine-agnostic Automated Lumbar MRI Segmentation using a Cascaded Model Based on Generative Neurons
Authors:
Promit Basak,
Rusab Sarmun,
Saidul Kabir,
Israa Al-Hashimi,
Enamul Hoque Bhuiyan,
Anwarul Hasan,
Muhammad Salman Khan,
Muhammad E. H. Chowdhury
Abstract:
Automated lumbar spine segmentation is very crucial for modern diagnosis systems. In this study, we introduce a novel machine-agnostic approach for segmenting lumbar vertebrae and intervertebral discs from MRI images, employing a cascaded model that synergizes an ROI detection and a Self-organized Operational Neural Network (Self-ONN)-based encoder-decoder network for segmentation. Addressing the…
▽ More
Automated lumbar spine segmentation is very crucial for modern diagnosis systems. In this study, we introduce a novel machine-agnostic approach for segmenting lumbar vertebrae and intervertebral discs from MRI images, employing a cascaded model that synergizes an ROI detection and a Self-organized Operational Neural Network (Self-ONN)-based encoder-decoder network for segmentation. Addressing the challenge of diverse MRI modalities, our methodology capitalizes on a unique dataset comprising images from 12 scanners and 34 subjects, enhanced through strategic preprocessing and data augmentation techniques. The YOLOv8 medium model excels in ROI extraction, achieving an excellent performance of 0.916 mAP score. Significantly, our Self-ONN-based model, combined with a DenseNet121 encoder, demonstrates excellent performance in lumbar vertebrae and IVD segmentation with a mean Intersection over Union (IoU) of 83.66%, a sensitivity of 91.44%, and Dice Similarity Coefficient (DSC) of 91.03%, as validated through rigorous 10-fold cross-validation. This study not only showcases an effective approach to MRI segmentation in spine-related disorders but also sets the stage for future advancements in automated diagnostic tools, emphasizing the need for further dataset expansion and model refinement for broader clinical applicability.
△ Less
Submitted 23 November, 2024;
originally announced November 2024.
-
Hardware Accelerators for Artificial Intelligence
Authors:
S M Mojahidul Ahsan,
Anurag Dhungel,
Mrittika Chowdhury,
Md Sakib Hasan,
Tamzidul Hoque
Abstract:
In this chapter, we aim to explore an in-depth exploration of the specialized hardware accelerators designed to enhance Artificial Intelligence (AI) applications, focusing on their necessity, development, and impact on the field of AI. It covers the transition from traditional computing systems to advanced AI-specific hardware, addressing the growing demands of AI algorithms and the inefficiencies…
▽ More
In this chapter, we aim to explore an in-depth exploration of the specialized hardware accelerators designed to enhance Artificial Intelligence (AI) applications, focusing on their necessity, development, and impact on the field of AI. It covers the transition from traditional computing systems to advanced AI-specific hardware, addressing the growing demands of AI algorithms and the inefficiencies of conventional architectures. The discussion extends to various types of accelerators, including GPUs, FPGAs, and ASICs, and their roles in optimizing AI workloads. Additionally, it touches on the challenges and considerations in designing and implementing these accelerators, along with future prospects in the evolution of AI hardware. This comprehensive overview aims to equip readers with a clear understanding of the current landscape and future directions in AI hardware development, making it accessible to both experts and newcomers to the field.
△ Less
Submitted 18 December, 2024; v1 submitted 20 November, 2024;
originally announced November 2024.
-
Automated Personnel Selection for Software Engineers Using LLM-Based Profile Evaluation
Authors:
Ahmed Akib Jawad Karim,
Shahria Hoque,
Md. Golam Rabiul Alam,
Md. Zia Uddin
Abstract:
Organizational success in todays competitive employment market depends on choosing the right staff. This work evaluates software engineer profiles using an automated staff selection method based on advanced natural language processing (NLP) techniques. A fresh dataset was generated by collecting LinkedIn profiles with important attributes like education, experience, skills, and self-introduction.…
▽ More
Organizational success in todays competitive employment market depends on choosing the right staff. This work evaluates software engineer profiles using an automated staff selection method based on advanced natural language processing (NLP) techniques. A fresh dataset was generated by collecting LinkedIn profiles with important attributes like education, experience, skills, and self-introduction. Expert feedback helped transformer models including RoBERTa, DistilBERT, and a customized BERT variation, LastBERT, to be adjusted. The models were meant to forecast if a candidate's profile fit the selection criteria, therefore allowing automated ranking and assessment. With 85% accuracy and an F1 score of 0.85, RoBERTa performed the best; DistilBERT provided comparable results at less computing expense. Though light, LastBERT proved to be less effective, with 75% accuracy. The reusable models provide a scalable answer for further categorization challenges. This work presents a fresh dataset and technique as well as shows how transformer models could improve recruiting procedures. Expanding the dataset, enhancing model interpretability, and implementing the system in actual environments will be part of future activities.
△ Less
Submitted 3 November, 2024; v1 submitted 30 October, 2024;
originally announced October 2024.
-
AI Can Enhance Creativity in Social Networks
Authors:
Raiyan Abdul Baten,
Ali Sarosh Bangash,
Krish Veera,
Gourab Ghoshal,
Ehsan Hoque
Abstract:
Can peer recommendation engines elevate people's creative performances in self-organizing social networks? Answering this question requires resolving challenges in data collection (e.g., tracing inspiration links and psycho-social attributes of nodes) and intervention design (e.g., balancing idea stimulation and redundancy in evolving information environments). We trained a model that predicts peo…
▽ More
Can peer recommendation engines elevate people's creative performances in self-organizing social networks? Answering this question requires resolving challenges in data collection (e.g., tracing inspiration links and psycho-social attributes of nodes) and intervention design (e.g., balancing idea stimulation and redundancy in evolving information environments). We trained a model that predicts people's ideation performances using semantic and network-structural features in an online platform. Using this model, we built SocialMuse, which maximizes people's predicted performances to generate peer recommendations for them. We found treatment networks leveraging SocialMuse outperforming AI-agnostic control networks in several creativity measures. The treatment networks were more decentralized than the control, as SocialMuse increasingly emphasized network-structural features at large network sizes. This decentralization spreads people's inspiration sources, helping inspired ideas stand out better. Our study provides actionable insights into building intelligent systems for elevating creativity.
△ Less
Submitted 11 December, 2024; v1 submitted 19 October, 2024;
originally announced October 2024.
-
MTDNS: Moving Target Defense for Resilient DNS Infrastructure
Authors:
Abdullah Aydeger,
Pei Zhou,
Sanzida Hoque,
Marco Carvalho,
Engin Zeydan
Abstract:
One of the most critical components of the Internet that an attacker could exploit is the DNS (Domain Name System) protocol and infrastructure. Researchers have been constantly developing methods to detect and defend against the attacks against DNS, specifically DNS flooding attacks. However, most solutions discard packets for defensive approaches, which can cause legitimate packets to be dropped,…
▽ More
One of the most critical components of the Internet that an attacker could exploit is the DNS (Domain Name System) protocol and infrastructure. Researchers have been constantly developing methods to detect and defend against the attacks against DNS, specifically DNS flooding attacks. However, most solutions discard packets for defensive approaches, which can cause legitimate packets to be dropped, making them highly dependable on detection strategies. In this paper, we propose MTDNS, a resilient MTD-based approach that employs Moving Target Defense techniques through Software Defined Networking (SDN) switches to redirect traffic to alternate DNS servers that are dynamically created and run under the Network Function Virtualization (NFV) framework. The proposed approach is implemented in a testbed environment by running our DNS servers as separate Virtual Network Functions, NFV Manager, SDN switches, and an SDN Controller. The experimental result shows that the MTDNS approach achieves a much higher success rate in resolving DNS queries and significantly reduces average latency even if there is a DNS flooding attack.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
Authors:
Shayekh Bin Islam,
Md Asib Rahman,
K S M Tozammel Hossain,
Enamul Hoque,
Shafiq Joty,
Md Rizwan Parvez
Abstract:
Retrieval-Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs), but existing methods often suffer from limited reasoning capabilities in effectively using the retrieved evidence, particularly when using open-source LLMs. To mitigate this gap, we introduce a novel framework, Open-RAG, designed to enhance reasoning capabilities in RAG with open-so…
▽ More
Retrieval-Augmented Generation (RAG) has been shown to enhance the factual accuracy of Large Language Models (LLMs), but existing methods often suffer from limited reasoning capabilities in effectively using the retrieved evidence, particularly when using open-source LLMs. To mitigate this gap, we introduce a novel framework, Open-RAG, designed to enhance reasoning capabilities in RAG with open-source LLMs. Our framework transforms an arbitrary dense LLM into a parameter-efficient sparse mixture of experts (MoE) model capable of handling complex reasoning tasks, including both single- and multi-hop queries. Open-RAG uniquely trains the model to navigate challenging distractors that appear relevant but are misleading. As a result, Open-RAG leverages latent learning, dynamically selecting relevant experts and integrating external knowledge effectively for more accurate and contextually relevant responses. In addition, we propose a hybrid adaptive retrieval method to determine retrieval necessity and balance the trade-off between performance gain and inference speed. Experimental results show that the Llama2-7B-based Open-RAG outperforms state-of-the-art LLMs and RAG models such as ChatGPT, Self-RAG, and Command R+ in various knowledge-intensive tasks. We open-source our code and models at https://openragmoe.github.io/
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Natural Language Generation for Visualizations: State of the Art, Challenges and Future Directions
Authors:
Enamul Hoque,
Mohammed Saidul Islam
Abstract:
Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core me…
▽ More
Natural language and visualization are two complementary modalities of human communication that play a crucial role in conveying information effectively. While visualizations help people discover trends, patterns, and anomalies in data, natural language descriptions help explain these insights. Thus, combining text with visualizations is a prevalent technique for effectively delivering the core message of the data. Given the rise of natural language generation (NLG), there is a growing interest in automatically creating natural language descriptions for visualizations, which can be used as chart captions, answering questions about charts, or telling data-driven stories. In this survey, we systematically review the state of the art on NLG for visualizations and introduce a taxonomy of the problem. The NLG tasks fall within the domain of Natural Language Interfaces (NLI) for visualization, an area that has garnered significant attention from both the research community and industry. To narrow down the scope of the survey, we primarily concentrate on the research works that focus on text generation for visualizations. To characterize the NLG problem and the design space of proposed solutions, we pose five Wh-questions, why and how NLG tasks are performed for visualizations, what the task inputs and outputs are, as well as where and when the generated texts are integrated with visualizations. We categorize the solutions used in the surveyed papers based on these "five Wh-questions." Finally, we discuss the key challenges and potential avenues for future research in this domain.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
FuzzEval: Assessing Fuzzers on Generating Context-Sensitive Inputs
Authors:
S Mahmudul Hasan,
Polina Kozyreva,
Endadul Hoque
Abstract:
Cryptographic protocols form the backbone of modern security systems, yet vulnerabilities persist within their implementations. Traditional testing techniques, including fuzzing, have struggled to effectively identify vulnerabilities in cryptographic libraries due to their reliance on context-sensitive inputs. This paper presents a comprehensive evaluation of eleven state-of-the-art fuzzers' abili…
▽ More
Cryptographic protocols form the backbone of modern security systems, yet vulnerabilities persist within their implementations. Traditional testing techniques, including fuzzing, have struggled to effectively identify vulnerabilities in cryptographic libraries due to their reliance on context-sensitive inputs. This paper presents a comprehensive evaluation of eleven state-of-the-art fuzzers' ability to generate context-sensitive inputs for testing a cryptographic standard, PKCS#1-v1.5, across thirteen implementations. Our study reveals nuanced performance differences among the fuzzers in terms of the validity and diversity of the produced inputs. This investigation underscores the limitations of existing fuzzers in handling context-sensitive inputs. These findings are expected to drive further research and development in this area.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Uddessho: An Extensive Benchmark Dataset for Multimodal Author Intent Classification in Low-Resource Bangla Language
Authors:
Fatema Tuj Johora Faria,
Mukaffi Bin Moin,
Md. Mahfuzur Rahman,
Md Morshed Alam Shanto,
Asif Iftekher Fahim,
Md. Moinul Hoque
Abstract:
With the increasing popularity of daily information sharing and acquisition on the Internet, this paper introduces an innovative approach for intent classification in Bangla language, focusing on social media posts where individuals share their thoughts and opinions. The proposed method leverages multimodal data with particular emphasis on authorship identification, aiming to understand the underl…
▽ More
With the increasing popularity of daily information sharing and acquisition on the Internet, this paper introduces an innovative approach for intent classification in Bangla language, focusing on social media posts where individuals share their thoughts and opinions. The proposed method leverages multimodal data with particular emphasis on authorship identification, aiming to understand the underlying purpose behind textual content, especially in the context of varied user-generated posts on social media. Current methods often face challenges in low-resource languages like Bangla, particularly when author traits intricately link with intent, as observed in social media posts. To address this, we present the Multimodal-based Author Bangla Intent Classification (MABIC) framework, utilizing text and images to gain deeper insights into the conveyed intentions. We have created a dataset named "Uddessho," comprising 3,048 instances sourced from social media. Our methodology comprises two approaches for classifying textual intent and multimodal author intent, incorporating early fusion and late fusion techniques. In our experiments, the unimodal approach achieved an accuracy of 64.53% in interpreting Bangla textual intent. In contrast, our multimodal approach significantly outperformed traditional unimodal methods, achieving an accuracy of 76.19%. This represents an improvement of 11.66%. To our best knowledge, this is the first research work on multimodal-based author intent classification for low-resource Bangla language social media posts.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
Mazed and Confused: A Dataset of Cybersickness, Working Memory, Mental Load, Physical Load, and Attention During a Real Walking Task in VR
Authors:
Jyotirmay Nag Setu,
Joshua M Le,
Ripan Kumar Kundu,
Barry Giesbrecht,
Tobias Höllerer,
Khaza Anuarul Hoque,
Kevin Desai,
John Quarles
Abstract:
Virtual Reality (VR) is quickly establishing itself in various industries, including training, education, medicine, and entertainment, in which users are frequently required to carry out multiple complex cognitive and physical activities. However, the relationship between cognitive activities, physical activities, and familiar feelings of cybersickness is not well understood and thus can be unpred…
▽ More
Virtual Reality (VR) is quickly establishing itself in various industries, including training, education, medicine, and entertainment, in which users are frequently required to carry out multiple complex cognitive and physical activities. However, the relationship between cognitive activities, physical activities, and familiar feelings of cybersickness is not well understood and thus can be unpredictable for developers. Researchers have previously provided labeled datasets for predicting cybersickness while users are stationary, but there have been few labeled datasets on cybersickness while users are physically walking. Thus, from 39 participants, we collected head orientation, head position, eye tracking, images, physiological readings from external sensors, and the self-reported cybersickness severity, physical load, and mental load in VR. Throughout the data collection, participants navigated mazes via real walking and performed tasks challenging their attention and working memory. To demonstrate the dataset's utility, we conducted a case study of training classifiers in which we achieved 95% accuracy for cybersickness severity classification. The noteworthy performance of the straightforward classifiers makes this dataset ideal for future researchers to develop cybersickness detection and reduction models. To better understand the features that helped with classification, we performed SHAP(SHapley Additive exPlanations) analysis, highlighting the importance of eye tracking and physiological measures for cybersickness prediction while walking. This open dataset can allow future researchers to study the connection between cybersickness and cognitive loads and develop prediction models. This dataset will empower future VR developers to design efficient and effective Virtual Environments by improving cognitive load management and minimizing cybersickness.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
A Persistent Hierarchical Bloom Filter-based Framework for Authentication and Tracking of ICs
Authors:
Fairuz Shadmani Shishir,
Md Mashfiq Rizvee,
Tanvir Hossain,
Tamzidul Hoque,
Domenic Forte,
Sumaiya Shomaji
Abstract:
Detecting counterfeit integrated circuits (ICs) in unreliable supply chains demands robust tracking and authentication. Physical Unclonable Functions (PUFs) offer unique IC identifiers, but noise undermines their utility. This study introduces the Persistent Hierarchical Bloom Filter (PHBF) framework, ensuring swift and accurate IC authentication with an accuracy rate of 100% across the supply cha…
▽ More
Detecting counterfeit integrated circuits (ICs) in unreliable supply chains demands robust tracking and authentication. Physical Unclonable Functions (PUFs) offer unique IC identifiers, but noise undermines their utility. This study introduces the Persistent Hierarchical Bloom Filter (PHBF) framework, ensuring swift and accurate IC authentication with an accuracy rate of 100% across the supply chain even with noisy PUF-generated signatures.
△ Less
Submitted 22 September, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Post-Quantum Secure UE-to-UE Communications
Authors:
Sanzida Hoque,
Abdullah Aydeger,
Engin Zeydan
Abstract:
The rapid development of quantum computing poses a significant threat to the security of current cryptographic systems, including those used in User Equipment (UE) for mobile communications. Conventional cryptographic algorithms such as Rivest-Shamir-Adleman (RSA) and Elliptic curve cryptography (ECC) are vulnerable to quantum computing attacks, which could jeopardize the confidentiality, integrit…
▽ More
The rapid development of quantum computing poses a significant threat to the security of current cryptographic systems, including those used in User Equipment (UE) for mobile communications. Conventional cryptographic algorithms such as Rivest-Shamir-Adleman (RSA) and Elliptic curve cryptography (ECC) are vulnerable to quantum computing attacks, which could jeopardize the confidentiality, integrity, and availability of sensitive data transmitted by UEs. This demo paper proposes the integration of Post-Quantum Cryptography (PQC) in TLS for UE Communication to mitigate the risks of quantum attacks. We present our setup and explain each of the components used. We also provide the entire workflow of the demo for other researchers to replicate the same setup. By addressing the implementation of PQC within a 5G network to secure UE-to-UE communication, this research aims to pave the way for developing quantum-resistant mobile devices and securing the future of wireless communications.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
SHARP-Net: A Refined Pyramid Network for Deficiency Segmentation in Culverts and Sewer Pipes
Authors:
Rasha Alshawi,
Md Meftahul Ferdaus,
Md Tamjidul Hoque,
Kendall Niles,
Ken Pathak,
Steve Sloan,
Mahdi Abdelguerfi
Abstract:
This paper introduces Semantic Haar-Adaptive Refined Pyramid Network (SHARP-Net), a novel architecture for semantic segmentation. SHARP-Net integrates a bottom-up pathway featuring Inception-like blocks with varying filter sizes (3x3…
▽ More
This paper introduces Semantic Haar-Adaptive Refined Pyramid Network (SHARP-Net), a novel architecture for semantic segmentation. SHARP-Net integrates a bottom-up pathway featuring Inception-like blocks with varying filter sizes (3x3$ and 5x5), parallel max-pooling, and additional spatial detection layers. This design captures multi-scale features and fine structural details. Throughout the network, depth-wise separable convolutions are used to reduce complexity. The top-down pathway of SHARP-Net focuses on generating high-resolution features through upsampling and information fusion using $1\times1$ and $3\times3$ depth-wise separable convolutions. We evaluated our model using our developed challenging Culvert-Sewer Defects dataset and the benchmark DeepGlobe Land Cover dataset. Our experimental evaluation demonstrated the base model's (excluding Haar-like features) effectiveness in handling irregular defect shapes, occlusions, and class imbalances. It outperformed state-of-the-art methods, including U-Net, CBAM U-Net, ASCU-Net, FPN, and SegFormer, achieving average improvements of 14.4% and 12.1% on the Culvert-Sewer Defects and DeepGlobe Land Cover datasets, respectively, with IoU scores of 77.2% and 70.6%. Additionally, the training time was reduced. Furthermore, the integration of carefully selected and fine-tuned Haar-like features enhanced the performance of deep learning models by at least 20%. The proposed SHARP-Net, incorporating Haar-like features, achieved an impressive IoU of 94.75%, representing a 22.74% improvement over the base model. These features were also applied to other deep learning models, showing a 35.0% improvement, proving their versatility and effectiveness. SHARP-Net thus provides a powerful and efficient solution for accurate semantic segmentation in challenging real-world scenarios.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts
Authors:
Mohammed Saidul Islam,
Md Tahmid Rahman Laskar,
Md Rizwan Parvez,
Enamul Hoque,
Shafiq Joty
Abstract:
Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating huma…
▽ More
Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating human intervention, which can be time-consuming and mentally taxing. While Large Language Models (LLMs) excel in various NLP tasks, their ability to generate coherent and comprehensive data stories remains underexplored. In this work, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents designed to replicate the human storytelling process: one for understanding and describing the data (Reflection), generating the outline, and narration, and another for verification at each intermediary step. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.
△ Less
Submitted 3 October, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots
Authors:
Ajnabiul Hoque,
Manajit Das,
Mayank Baranwal,
Raghavan B. Sunoj
Abstract:
A chemical reaction mechanism (CRM) is a sequence of molecular-level events involving bond-breaking/forming processes, generating transient intermediates along the reaction pathway as reactants transform into products. Understanding such mechanisms is crucial for designing and discovering new reactions. One of the currently available methods to probe CRMs is quantum mechanical (QM) computations. T…
▽ More
A chemical reaction mechanism (CRM) is a sequence of molecular-level events involving bond-breaking/forming processes, generating transient intermediates along the reaction pathway as reactants transform into products. Understanding such mechanisms is crucial for designing and discovering new reactions. One of the currently available methods to probe CRMs is quantum mechanical (QM) computations. The resource-intensive nature of QM methods and the scarcity of mechanism-based datasets motivated us to develop reliable ML models for predicting mechanisms. In this study, we created a comprehensive dataset with seven distinct classes, each representing uniquely characterized elementary steps. Subsequently, we developed an interpretable attention-based GNN that achieved near-unity and 96% accuracy, respectively for reaction step classification and the prediction of reactive atoms in each such step, capturing interactions between the broader reaction context and local active regions. The near-perfect classification enables accurate prediction of both individual events and the entire CRM, mitigating potential drawbacks of Seq2Seq approaches, where a wrongly predicted character leads to incoherent CRM identification. In addition to interpretability, our model adeptly identifies key atom(s) even from out-of-distribution classes. This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.