-
Advanced fraud detection using machine learning models: enhancing financial transaction security
Authors:
Nudrat Fariha,
Md Nazmuddin Moin Khan,
Md Iqbal Hossain,
Syed Ali Reza,
Joy Chakra Bortty,
Kazi Sharmin Sultana,
Md Shadidur Islam Jawad,
Saniah Safat,
Md Abdul Ahad,
Maksuda Begum
Abstract:
The rise of digital payments has accelerated the need for intelligent and scalable systems to detect fraud. This research presents an end-to-end, feature-rich machine learning framework for detecting credit card transaction anomalies and fraud using real-world data. The study begins by merging transactional, cardholder, merchant, and merchant category datasets from a relational database to create…
▽ More
The rise of digital payments has accelerated the need for intelligent and scalable systems to detect fraud. This research presents an end-to-end, feature-rich machine learning framework for detecting credit card transaction anomalies and fraud using real-world data. The study begins by merging transactional, cardholder, merchant, and merchant category datasets from a relational database to create a unified analytical view. Through the feature engineering process, we extract behavioural signals such as average spending, deviation from historical patterns, transaction timing irregularities, and category frequency metrics. These features are enriched with temporal markers such as hour, day of week, and weekend indicators to expose all latent patterns that indicate fraudulent behaviours. Exploratory data analysis reveals contextual transaction trends across all the dataset features. Using the transactional data, we train and evaluate a range of unsupervised models: Isolation Forest, One Class SVM, and a deep autoencoder trained to reconstruct normal behavior. These models flag the top 1% of reconstruction errors as outliers. PCA visualizations illustrate each models ability to separate anomalies into a two-dimensional latent space. We further segment the transaction landscape using K-Means clustering and DBSCAN to identify dense clusters of normal activity and isolate sparse, suspicious regions.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME
Authors:
Bidyarthi Paul,
SM Musfiqur Rahman,
Dipta Biswas,
Md. Ziaul Hasan,
Md. Zahid Hossain
Abstract:
Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models: Linear SVM, KNN, and Random Forest with n-gram data f…
▽ More
Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models: Linear SVM, KNN, and Random Forest with n-gram data from a TF-IDF vectorizer. We additionally investigated how PCA affects the reduction of dimensionality. Moreover, we utilized a BiLSTM model and AdaBoost to improve decision trees. To make our machine learning models easier to understand, we used LIME to explain the predictions of the AdaBoost classifier, which uses decision trees. With the goal of advancing sentiment analysis in languages with limited resources, our work examines various techniques to find efficient techniques for emotion identification in Bangla.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being
Authors:
Rahatara Ferdousi,
M Anwar Hossain
Abstract:
The rise of large language models (LLMs) has created new possibilities for digital twins in healthcare. However, the deployment of such systems in consumer health contexts raises significant concerns related to hallucination, bias, lack of transparency, and ethical misuse. In response to recommendations from health authorities such as the World Health Organization (WHO), we propose Responsible Hea…
▽ More
The rise of large language models (LLMs) has created new possibilities for digital twins in healthcare. However, the deployment of such systems in consumer health contexts raises significant concerns related to hallucination, bias, lack of transparency, and ethical misuse. In response to recommendations from health authorities such as the World Health Organization (WHO), we propose Responsible Health Twin (RHealthTwin), a principled framework for building and governing AI-powered digital twins for well-being assistance. RHealthTwin processes multimodal inputs that guide a health-focused LLM to produce safe, relevant, and explainable responses. At the core of RHealthTwin is the Responsible Prompt Engine (RPE), which addresses the limitations of traditional LLM configuration. Conventionally, users input unstructured prompt and the system instruction to configure the LLM, which increases the risk of hallucination. In contrast, RPE extracts predefined slots dynamically to structure both inputs. This guides the language model to generate responses that are context aware, personalized, fair, reliable, and explainable for well-being assistance. The framework further adapts over time through a feedback loop that updates the prompt structure based on user satisfaction. We evaluate RHealthTwin across four consumer health domains including mental support, symptom triage, nutrition planning, and activity coaching. RPE achieves state-of-the-art results with BLEU = 0.41, ROUGE-L = 0.63, and BERTScore = 0.89 on benchmark datasets. Also, we achieve over 90% in ethical compliance and instruction-following metrics using LLM-as-judge evaluation, outperforming baseline strategies. We envision RHealthTwin as a forward-looking foundation for responsible LLM-based applications in health and well-being.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Unraveling Ethereum's Mempool: The Impact of Fee Fairness, Transaction Prioritization, and Consensus Efficiency
Authors:
S M Mostaq Hossain,
Amani Altarawneh
Abstract:
Ethereum's transaction pool (mempool) dynamics and fee market efficiency critically affect transaction inclusion, validator workload, and overall network performance. This research empirically analyzes gas price variations, mempool clearance rates, and block finalization times in Ethereum's proof-of-stake ecosystem using real-time data from Geth and Prysm nodes. We observe that high-fee transactio…
▽ More
Ethereum's transaction pool (mempool) dynamics and fee market efficiency critically affect transaction inclusion, validator workload, and overall network performance. This research empirically analyzes gas price variations, mempool clearance rates, and block finalization times in Ethereum's proof-of-stake ecosystem using real-time data from Geth and Prysm nodes. We observe that high-fee transactions are consistently prioritized, while low-fee transactions face delays or exclusion despite EIP-1559's intended improvements. Mempool congestion remains a key factor in validator efficiency and proposal latency. We provide empirical evidence of persistent fee-based disparities and show that extremely high fees do not always guarantee faster confirmation, revealing inefficiencies in the current fee market. To address these issues, we propose congestion-aware fee adjustments, reserved block slots for low-fee transactions, and improved handling of out-of-gas vulnerabilities. By mitigating prioritization bias and execution inefficiencies, our findings support more equitable transaction inclusion, enhance validator performance, and promote scalability. This work contributes to Ethereum's long-term decentralization by reducing dependence on high transaction fees for network participation.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Physics of unraveling and micromechanics of hagfish threads
Authors:
Mohammad Tanver Hossain,
Dakota Piorkowski,
Andrew Lowe,
Wonsik Eom,
Abhishek Shetty,
Sameh H. Tawfick,
Douglas S. Fudge,
Randy H. Ewoldt
Abstract:
Hagfish slime is a unique biological material composed of mucus and protein threads that rapidly deploy into a cohesive network when deployed in seawater. The forces involved in thread deployment and interactions among mucus and threads are key to understanding how hagfish slime rapidly assembles into a cohesive, functional network. Despite extensive interest in its biophysical properties, the mec…
▽ More
Hagfish slime is a unique biological material composed of mucus and protein threads that rapidly deploy into a cohesive network when deployed in seawater. The forces involved in thread deployment and interactions among mucus and threads are key to understanding how hagfish slime rapidly assembles into a cohesive, functional network. Despite extensive interest in its biophysical properties, the mechanical forces governing thread deployment and interaction remain poorly quantified. Here, we present the first direct in situ measurements of the micromechanical forces involved in hagfish slime formation, including mucus mechanical properties, skein peeling force, thread-mucus adhesion, and thread-thread cohesion. Using a custom glass-rod force sensing system, we show that thread deployment initiates when peeling forces exceed a threshold of approximately 6.8 nN. To understand the flow strength required for unraveling, we used a rheo-optic setup to impose controlled shear flow, enabling us to directly observe unraveling dynamics and determine the critical shear rate for unraveling of the skeins, which we then interpreted using an updated peeling-based force balance model. Our results reveal that thread-mucus adhesion dominates over thread-thread adhesion and that deployed threads contribute minimally to bulk shear rheology at constant flow rate. These findings clarify the physics underlying the rapid, flow-triggered assembly of hagfish slime and inform future designs of synthetic deployable fiber-gel systems.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Fingerprinting Deep Learning Models via Network Traffic Patterns in Federated Learning
Authors:
Md Nahid Hasan Shuvo,
Moinul Hossain
Abstract:
Federated Learning (FL) is increasingly adopted as a decentralized machine learning paradigm due to its capability to preserve data privacy by training models without centralizing user data. However, FL is susceptible to indirect privacy breaches via network traffic analysis-an area not explored in existing research. The primary objective of this research is to study the feasibility of fingerprint…
▽ More
Federated Learning (FL) is increasingly adopted as a decentralized machine learning paradigm due to its capability to preserve data privacy by training models without centralizing user data. However, FL is susceptible to indirect privacy breaches via network traffic analysis-an area not explored in existing research. The primary objective of this research is to study the feasibility of fingerprinting deep learning models deployed within FL environments by analyzing their network-layer traffic information. In this paper, we conduct an experimental evaluation using various deep learning architectures (i.e., CNN, RNN) within a federated learning testbed. We utilize machine learning algorithms, including Support Vector Machines (SVM), Random Forest, and Gradient-Boosting, to fingerprint unique patterns within the traffic data. Our experiments show high fingerprinting accuracy, achieving 100% accuracy using Random Forest and around 95.7% accuracy using SVM and Gradient Boosting classifiers. This analysis suggests that we can identify specific architectures running within the subsection of the network traffic. Hence, if an adversary knows about the underlying DL architecture, they can exploit that information and conduct targeted attacks. These findings suggest a notable security vulnerability in FL systems and the necessity of strengthening it at the network level.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
High-throughput viscometry via machine-learning from videos of inverted vials
Authors:
Ignacio Arretche,
Mohammad Tanver Hossain,
Ramdas Tiwari,
Abbie Kim,
Mya G. Mills,
Connor D. Armstrong,
Jacob J. Lessard,
Sameh H. Tawfick,
Randy H. Ewoldt
Abstract:
Although the inverted vial test has been widely used as a qualitative method for estimating fluid viscosity, quantitative rheological characterization has remained limited due to its complex, uncontrolled flow - driven by gravity, surface tension, inertia, and initial conditions. Here, we present a computer vision (CV) viscometer that automates the inverted vial test and enables quantitative visco…
▽ More
Although the inverted vial test has been widely used as a qualitative method for estimating fluid viscosity, quantitative rheological characterization has remained limited due to its complex, uncontrolled flow - driven by gravity, surface tension, inertia, and initial conditions. Here, we present a computer vision (CV) viscometer that automates the inverted vial test and enables quantitative viscosity inference across nearly five orders of magnitude (0.01-1000 Pas), without requiring direct velocity field measurements. The system simultaneously inverts multiple vials and records videos of the evolving fluid, which are fed into a neural network that approximates the inverse function from visual features and known fluid density. Despite the complex, multi-regime flow within the vial, our approach achieves relative errors below 25%, improving to 15% for viscosities above 0.1 Pas. When tested on non-Newtonian polymer solutions, the method reliably estimates zero-shear viscosity as long as viscoelastic or shear-thinning behaviors remain negligible within the flow regime. Moreover, high standard deviations in the inferred values may serve as a proxy for identifying fluids with strong non-Newtonian behavior. The CV viscometer requires only one camera and one motor, is contactless and low-cost, and can be easily integrated into high-throughput experimental automated and manual workflows. Transcending traditional characterization paradigms, our method leverages uncontrolled flows and visual features to achieve simplicity and scalability, enabling high-throughput viscosity inference that can meet the growing demand of data-driven material models while remaining accessible to lower resource environments.
△ Less
Submitted 30 May, 2025;
originally announced June 2025.
-
Structured Pruning and Quantization for Learned Image Compression
Authors:
Md Adnan Faisal Hossain,
Fengqing Zhu
Abstract:
The high computational costs associated with large deep learning models significantly hinder their practical deployment. Model pruning has been widely explored in deep learning literature to reduce their computational burden, but its application has been largely limited to computer vision tasks such as image classification and object detection. In this work, we propose a structured pruning method…
▽ More
The high computational costs associated with large deep learning models significantly hinder their practical deployment. Model pruning has been widely explored in deep learning literature to reduce their computational burden, but its application has been largely limited to computer vision tasks such as image classification and object detection. In this work, we propose a structured pruning method targeted for Learned Image Compression (LIC) models that aims to reduce the computational costs associated with image compression while maintaining the rate-distortion performance. We employ a Neural Architecture Search (NAS) method based on the rate-distortion loss for computing the pruning ratio for each layer of the network. We compare our pruned model with the uncompressed LIC Model with same network architecture and show that it can achieve model size reduction without any BD-Rate performance drop. We further show that our pruning method can be integrated with model quantization to achieve further model compression while maintaining similar BD-Rate performance. We have made the source code available at gitlab.com/viper-purdue/lic-pruning.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Flexible Mixed Precision Quantization for Learned Image Compression
Authors:
Md Adnan Faisal Hossain,
Zhihao Duan,
Fengqing Zhu
Abstract:
Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the computational complexity of LIC models. However, most existing works perform fixed-precision quantization which suffers from sub-optimal utilization of resource…
▽ More
Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the computational complexity of LIC models. However, most existing works perform fixed-precision quantization which suffers from sub-optimal utilization of resources due to the varying sensitivity to quantization of different layers of a neural network. In this paper, we propose a Flexible Mixed Precision Quantization (FMPQ) method that assigns different bit-widths to different layers of the quantized network using the fractional change in rate-distortion loss as the bit-assignment criterion. We also introduce an adaptive search algorithm which reduces the time-complexity of searching for the desired distribution of quantization bit-widths given a fixed model size. Evaluation of our method shows improved BD-Rate performance under similar model size constraints compared to other works on quantization of LIC models. We have made the source code available at gitlab.com/viper-purdue/fmpq.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Electromagnetically Reconfigurable Antennas for 6G: Enabling Technologies, Prototype Studies, and Research Outlook
Authors:
Pinjun Zheng,
Ruiqi Wang,
Yuchen Zhang,
Md. Jahangir Hossain,
Anas Chaaban,
Atif Shamim,
Tareq Y. Al-Naffouri
Abstract:
The transition to the sixth-generation (6G) network is anticipated to redefine wireless transceiver architectures, demanding higher adaptability and efficiency at the antenna layer. Electromagnetically reconfigurable antennas (ERAs) have emerged as a promising solution capable of dynamically reconfiguring wireless channels to meet these requirements. This article presents an overview of recent adv…
▽ More
The transition to the sixth-generation (6G) network is anticipated to redefine wireless transceiver architectures, demanding higher adaptability and efficiency at the antenna layer. Electromagnetically reconfigurable antennas (ERAs) have emerged as a promising solution capable of dynamically reconfiguring wireless channels to meet these requirements. This article presents an overview of recent advancements in ERA technology, underscoring its transformative potential for 6G applications. Drawing from several initial studies, we demonstrate that ERAs can significantly enhance communication rates and hardware efficiency. Nevertheless, critical challenges remain in hardware design and signal processing methodologies, necessitating concerted efforts from both the antenna and communication communities. We identify these gaps and outline key research directions to fully unlock the capabilities of ERAs in next-generation wireless networks.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
BD Open LULC Map: High-resolution land use land cover mapping & benchmarking for urban development in Dhaka, Bangladesh
Authors:
Mir Sazzat Hossain,
Ovi Paul,
Md Akil Raihan Iftee,
Rakibul Hasan Rajib,
Abu Bakar Siddik Nayem,
Anis Sarker,
Arshad Momen,
Md. Ashraful Amin,
Amin Ahsan Ali,
AKM Mahbubur Rahman
Abstract:
Land Use Land Cover (LULC) mapping using deep learning significantly enhances the reliability of LULC classification, aiding in understanding geography, socioeconomic conditions, poverty levels, and urban sprawl. However, the scarcity of annotated satellite data, especially in South/East Asian developing countries, poses a major challenge due to limited funding, diverse infrastructures, and dense…
▽ More
Land Use Land Cover (LULC) mapping using deep learning significantly enhances the reliability of LULC classification, aiding in understanding geography, socioeconomic conditions, poverty levels, and urban sprawl. However, the scarcity of annotated satellite data, especially in South/East Asian developing countries, poses a major challenge due to limited funding, diverse infrastructures, and dense populations. In this work, we introduce the BD Open LULC Map (BOLM), providing pixel-wise LULC annotations across eleven classes (e.g., Farmland, Water, Forest, Urban Structure, Rural Built-Up) for Dhaka metropolitan city and its surroundings using high-resolution Bing satellite imagery (2.22 m/pixel). BOLM spans 4,392 sq km (891 million pixels), with ground truth validated through a three-stage process involving GIS experts. We benchmark LULC segmentation using DeepLab V3+ across five major classes and compare performance on Bing and Sentinel-2A imagery. BOLM aims to support reliable deep models and domain adaptation tasks, addressing critical LULC dataset gaps in South/East Asia.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Privacy-Preserving Chest X-ray Report Generation via Multimodal Federated Learning with ViT and GPT-2
Authors:
Md. Zahid Hossain,
Mustofa Ahmed,
Most. Sharmin Sultana Samu,
Md. Rakibul Islam
Abstract:
The automated generation of radiology reports from chest X-ray images holds significant promise in enhancing diagnostic workflows while preserving patient privacy. Traditional centralized approaches often require sensitive data transfer, posing privacy concerns. To address this, the study proposes a Multimodal Federated Learning framework for chest X-ray report generation using the IU-Xray dataset…
▽ More
The automated generation of radiology reports from chest X-ray images holds significant promise in enhancing diagnostic workflows while preserving patient privacy. Traditional centralized approaches often require sensitive data transfer, posing privacy concerns. To address this, the study proposes a Multimodal Federated Learning framework for chest X-ray report generation using the IU-Xray dataset. The system utilizes a Vision Transformer (ViT) as the encoder and GPT-2 as the report generator, enabling decentralized training without sharing raw data. Three Federated Learning (FL) aggregation strategies: FedAvg, Krum Aggregation and a novel Loss-aware Federated Averaging (L-FedAvg) were evaluated. Among these, Krum Aggregation demonstrated superior performance across lexical and semantic evaluation metrics such as ROUGE, BLEU, BERTScore and RaTEScore. The results show that FL can match or surpass centralized models in generating clinically relevant and semantically rich radiology reports. This lightweight and privacy-preserving framework paves the way for collaborative medical AI development without compromising data confidentiality.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Biaxial characterization of soft elastomers: experiments and data-adaptive configurational forces for fracture
Authors:
Miguel Angel Moreno-Mateos,
Simon Wiesheier,
Ali Esmaeili,
Mokarram Hossain,
Paul Steinmann
Abstract:
Understanding the fracture mechanics of soft solids remains a fundamental challenge due to their complex, nonlinear responses under large deformations. While multiaxial loading is key to probing their mechanical behavior, the role of such loading in fracture processes is still poorly understood. Here, we present a combined experimental-computational framework to investigate fracture in soft elasto…
▽ More
Understanding the fracture mechanics of soft solids remains a fundamental challenge due to their complex, nonlinear responses under large deformations. While multiaxial loading is key to probing their mechanical behavior, the role of such loading in fracture processes is still poorly understood. Here, we present a combined experimental-computational framework to investigate fracture in soft elastomers under equi-biaxial loading. We report original equi-biaxial quasi-static experiments on five elastomeric materials, revealing a spectrum of material and fracture behavior, from brittle-like to highly deformable response with crack tip strains exceeding 150 %. Motivated by these observations, we develop a hybrid computational testbed that mirrors the experimental setup and enables virtual biaxial tests. Central to this framework are two components: a data-adaptive formulation of hyperelastic energy functions that flexibly captures material behavior, and a post-processing implementation of the Configurational Force Method, providing a computationally efficient estimate of the J-integral at the crack tip. Our data-adaptive framework for hyperelastic energy functions proves versatility to capture with high accuracy the hyperelastic behavior observed in the biaxial experiments. This is important because accurately capturing the constitutive behaviour of soft solids is key for a reliable application of the Configurational Force Method to soft solids. In the limit of crack onset, a critical value of the crack tip configurational force allows for a criterion of fracture toughness. Together, our experimental, theoretical, and computational contributions offer a new paradigm for characterizing and designing soft materials with tailored fracture properties.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
RGC-Bent: A Novel Dataset for Bent Radio Galaxy Classification
Authors:
Mir Sazzat Hossain,
Khan Muhammad Bin Asad,
Payaswini Saikia,
Adrita Khan,
Md Akil Raihan Iftee,
Rakibul Hasan Rajib,
Arshad Momen,
Md Ashraful Amin,
Amin Ahsan Ali,
AKM Mahbubur Rahman
Abstract:
We introduce a novel machine learning dataset tailored for the classification of bent radio active galactic nuclei (AGN) in astronomical observations. Bent radio AGN, distinguished by their curved jet structures, provide critical insights into galaxy cluster dynamics, interactions within the intracluster medium, and the broader physics of AGN. Despite their astrophysical significance, the classifi…
▽ More
We introduce a novel machine learning dataset tailored for the classification of bent radio active galactic nuclei (AGN) in astronomical observations. Bent radio AGN, distinguished by their curved jet structures, provide critical insights into galaxy cluster dynamics, interactions within the intracluster medium, and the broader physics of AGN. Despite their astrophysical significance, the classification of bent radio AGN remains a challenge due to the scarcity of specialized datasets and benchmarks. To address this, we present a dataset, derived from a well-recognized radio astronomy survey, that is designed to support the classification of NAT (Narrow-Angle Tail) and WAT (Wide-Angle Tail) categories, along with detailed data processing steps. We further evaluate the performance of state-of-the-art deep learning models on the dataset, including Convolutional Neural Networks (CNNs), and transformer-based architectures. Our results demonstrate the effectiveness of advanced machine learning models in classifying bent radio AGN, with ConvNeXT achieving the highest F1-scores for both NAT and WAT sources. By sharing this dataset and benchmarks, we aim to facilitate the advancement of research in AGN classification, galaxy cluster environments and galaxy evolution.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
CrosGrpsABS: Cross-Attention over Syntactic and Semantic Graphs for Aspect-Based Sentiment Analysis in a Low-Resource Language
Authors:
Md. Mithun Hossain,
Md. Shakil Hossain,
Sudipto Chaki,
Md. Rajib Hossain,
Md. Saifur Rahman,
A. B. M. Shawkat Ali
Abstract:
Aspect-Based Sentiment Analysis (ABSA) is a fundamental task in natural language processing, offering fine-grained insights into opinions expressed in text. While existing research has largely focused on resource-rich languages like English which leveraging large annotated datasets, pre-trained models, and language-specific tools. These resources are often unavailable for low-resource languages su…
▽ More
Aspect-Based Sentiment Analysis (ABSA) is a fundamental task in natural language processing, offering fine-grained insights into opinions expressed in text. While existing research has largely focused on resource-rich languages like English which leveraging large annotated datasets, pre-trained models, and language-specific tools. These resources are often unavailable for low-resource languages such as Bengali. The ABSA task in Bengali remains poorly explored and is further complicated by its unique linguistic characteristics and a lack of annotated data, pre-trained models, and optimized hyperparameters. To address these challenges, this research propose CrosGrpsABS, a novel hybrid framework that leverages bidirectional cross-attention between syntactic and semantic graphs to enhance aspect-level sentiment classification. The CrosGrpsABS combines transformerbased contextual embeddings with graph convolutional networks, built upon rule-based syntactic dependency parsing and semantic similarity computations. By employing bidirectional crossattention, the model effectively fuses local syntactic structure with global semantic context, resulting in improved sentiment classification performance across both low- and high-resource settings. We evaluate CrosGrpsABS on four low-resource Bengali ABSA datasets and the high-resource English SemEval 2014 Task 4 dataset. The CrosGrpsABS consistently outperforms existing approaches, achieving notable improvements, including a 0.93% F1-score increase for the Restaurant domain and a 1.06% gain for the Laptop domain in the SemEval 2014 Task 4 benchmark.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection
Authors:
Md. Mithun Hossain,
Md. Shakil Hossain,
Sudipto Chaki,
M. F. Mridha
Abstract:
Multi-modal learning has become a critical research area because integrating text and image data can significantly improve performance in tasks such as classification, retrieval, and scene understanding. However, despite progress with pre-trained models, current approaches are limited by inadequate cross-modal interactions and static fusion strategies that do not fully exploit the complementary na…
▽ More
Multi-modal learning has become a critical research area because integrating text and image data can significantly improve performance in tasks such as classification, retrieval, and scene understanding. However, despite progress with pre-trained models, current approaches are limited by inadequate cross-modal interactions and static fusion strategies that do not fully exploit the complementary nature of different modalities. To address these shortcomings, we introduce a novel multi-modal Co-AttenDWG architecture that leverages dual-path encoding, co-attention with dimension-wise gating, and advanced expert fusion. Our approach begins by projecting text and image features into a common embedding space, where a dedicated co-attention mechanism enables simultaneous, fine-grained interactions between modalities. This mechanism is further enhanced by a dimension-wise gating network that adaptively regulates the feature contributions at the channel level, ensuring that only the most relevant information is emphasized. In parallel, dual-path encoders refine the representations by processing cross-modal information separately before an additional cross-attention layer further aligns modalities. The refined features are then aggregated via an expert fusion module that combines learned gating and self-attention to produce a robust, unified representation. We validate our approach on the MIMIC and SemEval Memotion 1.0, where experimental results demonstrate significant improvements in cross-modal alignment and state-of-the-art performance, underscoring the potential of our model for a wide range of multi-modal applications.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Advancing Excited-State Properties of 2D Materials Using a Dielectric-Dependent Hybrid Functional
Authors:
Arghya Ghosh,
Subrata Jana,
Manoar Hossain,
Dimple Rani,
Szymon Åšmiga,
Prasanjit Samal
Abstract:
Predicting accurate band gaps and optical properties of lower-dimensional materials, including two-dimensional van der Waals (vdW) materials and their heterostructures, remains a challenge within density functional theory (DFT) due to their unique screening compared to their bulk counterparts. Additionally, accurate treatment of the dielectric response is crucial for developing and applying screen…
▽ More
Predicting accurate band gaps and optical properties of lower-dimensional materials, including two-dimensional van der Waals (vdW) materials and their heterostructures, remains a challenge within density functional theory (DFT) due to their unique screening compared to their bulk counterparts. Additionally, accurate treatment of the dielectric response is crucial for developing and applying screened-exchange dielectric-dependent range-separated hybrid functionals (SE-DD-RSH) for vdW materials. In this work, we introduce a SE-DD-RSH functional to the 2D vdW materials like MoS2, WS2, hBN, black phosphorus (BP), and \b{eta}-InSe. By accounting for in-plane and out-of-plane dielectric responses, our method achieves accuracy comparable to advanced many-body techniques like G0 W0 and BSE@G0 W0 at a lower computational cost. We demonstrate improved band gap predictions and optical absorption spectra for both bulk and layered structures, including some heterostructures like MoS2/WS2 . This approach offers a practical and precise tool for exploring electronic and optical phenomena in 2D materials, paving the way for efficient computational studies of layered systems.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
FedCTTA: A Collaborative Approach to Continual Test-Time Adaptation in Federated Learning
Authors:
Rakibul Hasan Rajib,
Md Akil Raihan Iftee,
Mir Sazzat Hossain,
A. K. M. Mahbubur Rahman,
Sajib Mistry,
M Ashraful Amin,
Amin Ahsan Ali
Abstract:
Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, making it ideal for privacy-sensitive applications. However, FL models often suffer performance degradation due to distribution shifts between training and deployment. Test-Time Adaptation (TTA) offers a promising solution by allowing models to adapt using only test samples. However, e…
▽ More
Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, making it ideal for privacy-sensitive applications. However, FL models often suffer performance degradation due to distribution shifts between training and deployment. Test-Time Adaptation (TTA) offers a promising solution by allowing models to adapt using only test samples. However, existing TTA methods in FL face challenges such as computational overhead, privacy risks from feature sharing, and scalability concerns due to memory constraints. To address these limitations, we propose Federated Continual Test-Time Adaptation (FedCTTA), a privacy-preserving and computationally efficient framework for federated adaptation. Unlike prior methods that rely on sharing local feature statistics, FedCTTA avoids direct feature exchange by leveraging similarity-aware aggregation based on model output distributions over randomly generated noise samples. This approach ensures adaptive knowledge sharing while preserving data privacy. Furthermore, FedCTTA minimizes the entropy at each client for continual adaptation, enhancing the model's confidence in evolving target distributions. Our method eliminates the need for server-side training during adaptation and maintains a constant memory footprint, making it scalable even as the number of clients or training rounds increases. Extensive experiments show that FedCTTA surpasses existing methods across diverse temporal and spatial heterogeneity scenarios.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
FreqSelect: Frequency-Aware fMRI-to-Image Reconstruction
Authors:
Junliang Ye,
Lei Wang,
Md Zakir Hossain
Abstract:
Reconstructing natural images from functional magnetic resonance imaging (fMRI) data remains a core challenge in natural decoding due to the mismatch between the richness of visual stimuli and the noisy, low resolution nature of fMRI signals. While recent two-stage models, combining deep variational autoencoders (VAEs) with diffusion models, have advanced this task, they treat all spatial-frequenc…
▽ More
Reconstructing natural images from functional magnetic resonance imaging (fMRI) data remains a core challenge in natural decoding due to the mismatch between the richness of visual stimuli and the noisy, low resolution nature of fMRI signals. While recent two-stage models, combining deep variational autoencoders (VAEs) with diffusion models, have advanced this task, they treat all spatial-frequency components of the input equally. This uniform treatment forces the model to extract meaning features and suppress irrelevant noise simultaneously, limiting its effectiveness. We introduce FreqSelect, a lightweight, adaptive module that selectively filters spatial-frequency bands before encoding. By dynamically emphasizing frequencies that are most predictive of brain activity and suppressing those that are uninformative, FreqSelect acts as a content-aware gate between image features and natural data. It integrates seamlessly into standard very deep VAE-diffusion pipelines and requires no additional supervision. Evaluated on the Natural Scenes dataset, FreqSelect consistently improves reconstruction quality across both low- and high-level metrics. Beyond performance gains, the learned frequency-selection patterns offer interpretable insights into how different visual frequencies are represented in the brain. Our method generalizes across subjects and scenes, and holds promise for extension to other neuroimaging modalities, offering a principled approach to enhancing both decoding accuracy and neuroscientific interpretability.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
SRLoRA: Subspace Recomposition in Low-Rank Adaptation via Importance-Based Fusion and Reinitialization
Authors:
Haodong Yang,
Lei Wang,
Md Zakir Hossain
Abstract:
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method that injects two trainable low-rank matrices (A and B) into frozen pretrained models. While efficient, LoRA constrains updates to a fixed low-rank subspace (Delta W = BA), which can limit representational capacity and hinder downstream performance. We introduce Subspace Recomposition in Low-Rank Adaptation…
▽ More
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method that injects two trainable low-rank matrices (A and B) into frozen pretrained models. While efficient, LoRA constrains updates to a fixed low-rank subspace (Delta W = BA), which can limit representational capacity and hinder downstream performance. We introduce Subspace Recomposition in Low-Rank Adaptation (SRLoRA) via importance-based fusion and reinitialization, a novel approach that enhances LoRA's expressiveness without compromising its lightweight structure. SRLoRA assigns importance scores to each LoRA pair (a column of B and the corresponding row of A), and dynamically recomposes the subspace during training. Less important pairs are fused into the frozen backbone, freeing capacity to reinitialize new pairs along unused principal directions derived from the pretrained weight's singular value decomposition. This mechanism enables continual subspace refreshment and richer adaptation over time, without increasing the number of trainable parameters. We evaluate SRLoRA on both language and vision tasks, including the GLUE benchmark and various image classification datasets. SRLoRA consistently achieves faster convergence and improved accuracy over standard LoRA, demonstrating its generality, efficiency, and potential for broader PEFT applications.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data
Authors:
Md. Ehsanul Haque,
Md. Saymon Hosen Polash,
Md Al-Imran Sanjida Simla,
Md Alomgir Hossain,
Sarwar Jahan
Abstract:
Due to the rapid growth in the number of Internet of Things (IoT) networks, the cyber risk has increased exponentially, and therefore, we have to develop effective IDS that can work well with highly imbalanced datasets. A high rate of missed threats can be the result, as traditional machine learning models tend to struggle in identifying attacks when normal data volume is much higher than the volu…
▽ More
Due to the rapid growth in the number of Internet of Things (IoT) networks, the cyber risk has increased exponentially, and therefore, we have to develop effective IDS that can work well with highly imbalanced datasets. A high rate of missed threats can be the result, as traditional machine learning models tend to struggle in identifying attacks when normal data volume is much higher than the volume of attacks. For example, the dataset used in this study reveals a strong class imbalance with 94,659 instances of the majority class and only 28 instances of the minority class, making it quite challenging to determine rare attacks accurately. The challenges presented in this research are addressed by hybrid sampling techniques designed to improve data imbalance detection accuracy in IoT domains. After applying these techniques, we evaluate the performance of several machine learning models such as Random Forest, Soft Voting, Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and Logistic Regression with respect to the classification of cyber-attacks. The obtained results indicate that the Random Forest model achieved the best performance with a Kappa score of 0.9903, test accuracy of 0.9961, and AUC of 0.9994. Strong performance is also shown by the Soft Voting model, with an accuracy of 0.9952 and AUC of 0.9997, indicating the benefits of combining model predictions. Overall, this work demonstrates the value of hybrid sampling combined with robust model and feature selection for significantly improving IoT security against cyber-attacks, especially in highly imbalanced data environments.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Wavefunction-Free Approach for Predicting Nonlinear Responses in Weyl Semimetals
Authors:
Mohammad Yahyavi,
Ilya Belopolski,
Yuanjun Jin,
Md Shafayat Hossain,
Yilin Zhao,
Jinyang Ni,
Naizhou Wang,
Yi-Chun Hung,
Zi-Jia Cheng,
Tyler A. Cochran,
Tay-Rong Chang,
Wei-bo Gao,
Su-Yang Xu,
Jia-Xin Yin,
Qiong Ma,
M. Zahid Hasan,
Arun Bansil,
Naoto Nagaosa,
Guoqing Chang
Abstract:
By sidestepping the intractable calculations of many-body wavefunctions, density functional theory (DFT) has revolutionized the prediction of ground states of materials. However, predicting nonlinear responses--critical for next-generation quantum devices--still relies heavily on explicit wavefunctions, limiting computational efficiency. In this letter, using the circular photogalvanic effect (CPG…
▽ More
By sidestepping the intractable calculations of many-body wavefunctions, density functional theory (DFT) has revolutionized the prediction of ground states of materials. However, predicting nonlinear responses--critical for next-generation quantum devices--still relies heavily on explicit wavefunctions, limiting computational efficiency. In this letter, using the circular photogalvanic effect (CPGE) in Weyl semimetals as a representative example, we realize a 1000-fold computational speedup by eliminating the explicit dependence on wavefunctions. Our approach leverages the one-to-one correspondence between free parameters of Weyl fermions and the associated responses to obtain precise wavefunction-free formulations. Applying our methodology, we systematically investigated known Weyl semimetals and revealed that Ta$_3$S$_2$ exhibits photocurrents an order of magnitude greater than those observed in TaAs, with potential for an additional order-of-magnitude enhancement under strain. Our work paves the way for substantially more efficient screening and optimization of nonlinear electromagnetic properties of topological quantum materials.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Tri-Hybrid Multi-User Precoding Using Pattern-Reconfigurable Antennas: Fundamental Models and Practical Algorithms
Authors:
Pinjun Zheng,
Yuchen Zhang,
Tareq Y. Al-Naffouri,
Md. Jahangir Hossain,
Anas Chaaban
Abstract:
The integration of pattern-reconfigurable antennas into hybrid multiple-input multiple-output (MIMO) architectures presents a promising path toward high-efficiency and low-cost transceiver solutions. Pattern-reconfigurable antennas can dynamically steer per-antenna radiation patterns, enabling more efficient power utilization and interference suppression. In this work, we study a tri-hybrid MIMO a…
▽ More
The integration of pattern-reconfigurable antennas into hybrid multiple-input multiple-output (MIMO) architectures presents a promising path toward high-efficiency and low-cost transceiver solutions. Pattern-reconfigurable antennas can dynamically steer per-antenna radiation patterns, enabling more efficient power utilization and interference suppression. In this work, we study a tri-hybrid MIMO architecture for multi-user communication that integrates digital, analog, and antenna-domain precoding using pattern-reconfigurable antennas. For characterizing the reconfigurability of antenna radiation patterns, we develop two models -- Model~I and Model~II. Model~I captures realistic hardware constraints through limited pattern selection, while Model~II explores the performance upper bound by assuming arbitrary pattern generation. Based on these models, we develop two corresponding tri-hybrid precoding algorithms grounded in the weighted minimum mean square error (WMMSE) framework, which alternately optimize the digital, analog, and antenna precoders under practical per-antenna power constraints. Realistic simulations conducted in ray-tracing generated environments are utilized to evaluate the proposed system and algorithms. The results demonstrate the significant potential of the considered tri-hybrid architecture in enhancing communication performance and hardware efficiency. However, they also reveal that the existing hardware is not yet capable of fully realizing these performance gains, underscoring the need for joint progress in antenna design and communication theory development.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Bi-LSTM based Multi-Agent DRL with Computation-aware Pruning for Agent Twins Migration in Vehicular Embodied AI Networks
Authors:
Yuxiang Wei,
Zhuoqi Zeng,
Yue Zhong,
Jiawen Kang,
Ryan Wen Liu,
M. Shamim Hossain
Abstract:
With the advancement of large language models and embodied Artificial Intelligence (AI) in the intelligent transportation scenarios, the combination of them in intelligent transportation spawns the Vehicular Embodied AI Network (VEANs). In VEANs, Autonomous Vehicles (AVs) are typical agents whose local advanced AI applications are defined as vehicular embodied AI agents, enabling capabilities such…
▽ More
With the advancement of large language models and embodied Artificial Intelligence (AI) in the intelligent transportation scenarios, the combination of them in intelligent transportation spawns the Vehicular Embodied AI Network (VEANs). In VEANs, Autonomous Vehicles (AVs) are typical agents whose local advanced AI applications are defined as vehicular embodied AI agents, enabling capabilities such as environment perception and multi-agent collaboration. Due to computation latency and resource constraints, the local AI applications and services running on vehicular embodied AI agents need to be migrated, and subsequently referred to as vehicular embodied AI agent twins, which drive the advancement of vehicular embodied AI networks to offload intensive tasks to Roadside Units (RSUs), mitigating latency problems while maintaining service quality. Recognizing workload imbalance among RSUs in traditional approaches, we model AV-RSU interactions as a Stackelberg game to optimize bandwidth resource allocation for efficient migration. A Tiny Multi-Agent Bidirectional LSTM Proximal Policy Optimization (TMABLPPO) algorithm is designed to approximate the Stackelberg equilibrium through decentralized coordination. Furthermore, a personalized neural network pruning algorithm based on Path eXclusion (PX) dynamically adapts to heterogeneous AV computation capabilities by identifying task-critical parameters in trained models, reducing model complexity with less performance degradation. Experimental validation confirms the algorithm's effectiveness in balancing system load and minimizing delays, demonstrating significant improvements in vehicular embodied AI agent deployment.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling
Authors:
Xiao Wang,
Jong-Youl Choi,
Takuya Kurihaya,
Isaac Lyngaas,
Hong-Jun Yoon,
Ming Fan,
Nasik Muhammad Nafi,
Aristeidis Tsaris,
Ashwin M. Aji,
Maliha Hossain,
Mohamed Wahib,
Dali Wang,
Peter Thornton,
Prasanna Balaprakash,
Moetasim Ashfaq,
Dan Lu
Abstract:
Sparse observations and coarse-resolution climate models limit effective regional decision-making, underscoring the need for robust downscaling. However, existing AI methods struggle with generalization across variables and geographies and are constrained by the quadratic complexity of Vision Transformer (ViT) self-attention. We introduce ORBIT-2, a scalable foundation model for global, hyper-reso…
▽ More
Sparse observations and coarse-resolution climate models limit effective regional decision-making, underscoring the need for robust downscaling. However, existing AI methods struggle with generalization across variables and geographies and are constrained by the quadratic complexity of Vision Transformer (ViT) self-attention. We introduce ORBIT-2, a scalable foundation model for global, hyper-resolution climate downscaling. ORBIT-2 incorporates two key innovations: (1) Residual Slim ViT (Reslim), a lightweight architecture with residual learning and Bayesian regularization for efficient, robust prediction; and (2) TILES, a tile-wise sequence scaling algorithm that reduces self-attention complexity from quadratic to linear, enabling long-sequence processing and massive parallelism. ORBIT-2 scales to 10 billion parameters across 32,768 GPUs, achieving up to 1.8 ExaFLOPS sustained throughput and 92-98% strong scaling efficiency. It supports downscaling to 0.9 km global resolution and processes sequences up to 4.2 billion tokens. On 7 km resolution benchmarks, ORBIT-2 achieves high accuracy with R^2 scores in the range of 0.98 to 0.99 against observation data.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Atom-by-atom Imaging of Moiré Phasons using Electron Ptychography
Authors:
Yichao Zhang,
Ballal Ahammed,
Sang Hyun Bae,
Chia-Hao Lee,
Jeffrey Huang,
Mohammad Abir Hossain,
Tawfiqur Rakib,
Arend van der Zande,
Elif Ertekin,
Pinshane Y. Huang
Abstract:
Twisted 2D materials exhibit unique vibrational modes called moiré phonons, which arise from the moiré superlattice. Here, we demonstrate atom-by-atom imaging of phasons, an ultrasoft class of moiré phonons in twisted bilayer WSe2. Using ultrahigh-resolution (<15 pm) electron ptychography, we image the size and shape of each atom to extract time-averaged vibrational amplitudes as a function of twi…
▽ More
Twisted 2D materials exhibit unique vibrational modes called moiré phonons, which arise from the moiré superlattice. Here, we demonstrate atom-by-atom imaging of phasons, an ultrasoft class of moiré phonons in twisted bilayer WSe2. Using ultrahigh-resolution (<15 pm) electron ptychography, we image the size and shape of each atom to extract time-averaged vibrational amplitudes as a function of twist angle and position. We observe several signature properties of moiré phasons, such as increased vibrational amplitudes at solitons and AA-stacked regions. By correlating experiments with molecular dynamics simulations and lattice dynamics calculations, we show phasons dominate the thermal vibrations in low-angle twisted bilayers. These results represent a powerful route to image thermal vibrations at atomic resolution, unlocking experimental studies of a thus-far hidden branch of moiré phonon physics.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Tri-Hybrid Multi-User Precoding Based on Electromagnetically Reconfigurable Antennas
Authors:
Pinjun Zheng,
Yuchen Zhang,
Tareq Y. Al-Naffouri,
Md. Jahangir Hossain,
Anas Chaaban
Abstract:
The tri-hybrid precoding architecture based on electromagnetically reconfigurable antennas (ERAs) is a promising solution for overcoming key limitations in multiple-input multiple-output communication systems. Aiming to further understand its potential, this paper investigates the tri-hybrid multi-user precoding problem using pattern reconfigurable ERAs. To reduce model complexity and improve prac…
▽ More
The tri-hybrid precoding architecture based on electromagnetically reconfigurable antennas (ERAs) is a promising solution for overcoming key limitations in multiple-input multiple-output communication systems. Aiming to further understand its potential, this paper investigates the tri-hybrid multi-user precoding problem using pattern reconfigurable ERAs. To reduce model complexity and improve practicality, we characterize each antenna's radiation pattern using a spherical harmonics decomposition. While mathematically tractable, this approach may lead to over-optimized patterns that are physically unrealizable. To address this, we introduce a projection step that maps the optimized patterns onto a realizable set. Simulation results demonstrate that spherical harmonics-based radiation pattern optimization significantly enhances sum rate performance. However, after projection onto a realizable set obtained from real ERA hardware, the performance gain is notably reduced or even negligible, underscoring the need for more effective projection techniques and improved reconfigurable antenna hardware.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Explainable AI-Driven Detection of Human Monkeypox Using Deep Learning and Vision Transformers: A Comprehensive Analysis
Authors:
Md. Zahid Hossain,
Md. Rakibul Islam,
Most. Sharmin Sultana Samu
Abstract:
Since mpox can spread from person to person, it is a zoonotic viral illness that poses a significant public health concern. It is difficult to make an early clinical diagnosis because of how closely its symptoms match those of measles and chickenpox. Medical imaging combined with deep learning (DL) techniques has shown promise in improving disease detection by analyzing affected skin areas. Our st…
▽ More
Since mpox can spread from person to person, it is a zoonotic viral illness that poses a significant public health concern. It is difficult to make an early clinical diagnosis because of how closely its symptoms match those of measles and chickenpox. Medical imaging combined with deep learning (DL) techniques has shown promise in improving disease detection by analyzing affected skin areas. Our study explore the feasibility to train deep learning and vision transformer-based models from scratch with publicly available skin lesion image dataset. Our experimental results show dataset limitation as a major drawback to build better classifier models trained from scratch. We used transfer learning with the help of pre-trained models to get a better classifier. The MobileNet-v2 outperformed other state of the art pre-trained models with 93.15% accuracy and 93.09% weighted average F1 score. ViT B16 and ResNet-50 also achieved satisfactory performance compared to already available studies with accuracy 92.12% and 86.21% respectively. To further validate the performance of the models, we applied explainable AI techniques.
△ Less
Submitted 3 April, 2025;
originally announced May 2025.
-
Design and Application of Multimodal Large Language Model Based System for End to End Automation of Accident Dataset Generation
Authors:
MD Thamed Bin Zaman Chowdhury,
Moazzem Hossain
Abstract:
Road traffic accidents remain a major public safety and socio-economic issue in developing countries like Bangladesh. Existing accident data collection is largely manual, fragmented, and unreliable, resulting in underreporting and inconsistent records. This research proposes a fully automated system using Large Language Models (LLMs) and web scraping techniques to address these challenges. The pip…
▽ More
Road traffic accidents remain a major public safety and socio-economic issue in developing countries like Bangladesh. Existing accident data collection is largely manual, fragmented, and unreliable, resulting in underreporting and inconsistent records. This research proposes a fully automated system using Large Language Models (LLMs) and web scraping techniques to address these challenges. The pipeline consists of four components: automated web scraping code generation, news collection from online sources, accident news classification with structured data extraction, and duplicate removal. The system uses the multimodal generative LLM Gemini-2.0-Flash for seamless automation. The code generation module classifies webpages into pagination, dynamic, or infinite scrolling categories and generates suitable Python scripts for scraping. LLMs also classify and extract key accident information such as date, time, location, fatalities, injuries, road type, vehicle types, and pedestrian involvement. A deduplication algorithm ensures data integrity by removing duplicate reports. The system scraped 14 major Bangladeshi news sites over 111 days (Oct 1, 2024 - Jan 20, 2025), processing over 15,000 news articles and identifying 705 unique accidents. The code generation module achieved 91.3% calibration and 80% validation accuracy. Chittagong reported the highest number of accidents (80), fatalities (70), and injuries (115), followed by Dhaka, Faridpur, Gazipur, and Cox's Bazar. Peak accident times were morning (8-9 AM), noon (12-1 PM), and evening (6-7 PM). A public repository was also developed with usage instructions. This study demonstrates the viability of an LLM-powered, scalable system for accurate, low-effort accident data collection, providing a foundation for data-driven road safety policymaking in Bangladesh.
△ Less
Submitted 23 April, 2025;
originally announced May 2025.
-
Low latency FPGA implementation of twisted Edward curve cryptography hardware accelerator over prime field
Authors:
Md Rownak Hossain,
Md Sazedur Rahman,
Kh Shahriya Zaman,
Walid El Fezzani,
Mohammad Arif Sobhan Bhuiyan,
Chia Chao Kang,
Teh Jia Yew,
Mahdi H. Miraz
Abstract:
The performance of any elliptic curve cryptography hardware accelerator significantly relies on the efficiency of the underlying point multiplication (PM) architecture. This article presents a hardware implementation of field-programmable gate array (FPGA) based modular arithmetic, group operation, and point multiplication unit on the twisted Edwards curve (Edwards25519) over the 256-bit prime fie…
▽ More
The performance of any elliptic curve cryptography hardware accelerator significantly relies on the efficiency of the underlying point multiplication (PM) architecture. This article presents a hardware implementation of field-programmable gate array (FPGA) based modular arithmetic, group operation, and point multiplication unit on the twisted Edwards curve (Edwards25519) over the 256-bit prime field. An original hardware architecture of a unified point operation module in projective coordinates that executes point addition and point doubling within a single module has been developed, taking only 646 clock cycles and ensuring a better security level than conventional approaches. The proposed point multiplication module consumes 1.4 ms time, operating at a maximal clock frequency of 117.8 MHz utilising 164,730 clock cycles having 183.38 kbps throughput on the Xilinx Virtex-5 FPGA platform for 256-bit length of key. The comparative assessment of latency and throughput across various related recent works indicates the effectiveness of our proposed PM architecture. Finally, this high throughput and low latency PM architecture will be a good candidate for rapid data encryption in high-speed wireless communication networks.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Durghotona GPT: A Web Scraping and Large Language Model Based Framework to Generate Road Accident Dataset Automatically in Bangladesh
Authors:
MD Thamed Bin Zaman Chowdhury,
Moazzem Hossain,
Md. Ridwanul Islam
Abstract:
Road accidents pose significant concerns globally. They lead to large financial losses, injuries, disabilities, and societal challenges. Accurate and timely accident data is essential for predicting and mitigating these events. This paper presents a novel framework named 'Durghotona GPT' that integrates web scraping and Large Language Models (LLMs) to automate the generation of comprehensive accid…
▽ More
Road accidents pose significant concerns globally. They lead to large financial losses, injuries, disabilities, and societal challenges. Accurate and timely accident data is essential for predicting and mitigating these events. This paper presents a novel framework named 'Durghotona GPT' that integrates web scraping and Large Language Models (LLMs) to automate the generation of comprehensive accident datasets from prominent national dailies in Bangladesh. The authors collected accident reports from three major newspapers: Prothom Alo, Dhaka Tribune, and The Daily Star. The collected news was then processed using the newest available LLMs: GPT-4, GPT-3.5, and Llama-3. The framework efficiently extracts relevant information, categorizes reports, and compiles detailed datasets. Thus, this framework overcomes limitations of manual data collection methods such as delays, errors, and communication gaps. The authors' evaluation demonstrates that Llama-3, an open-source model, performs comparably to GPT-4. It achieved 89% accuracy in the authors' evaluation. Therefore, it can be considered a cost-effective alternative for similar tasks. The results suggest that the framework developed by the authors can drastically enhance the quality and availability of accident data. As a result, it can support critical applications in traffic safety analysis, urban planning, and public health. The authors also developed an interface for 'Durghotona GPT' for ease of use as part of this paper. Future work will focus on expanding data collection methods and refining LLMs to further increase dataset accuracy and applicability.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Tunable Thermal Expansion in Functionalized 2D Boron Nitride: A First-Principles Investigation
Authors:
Sk Mujaffar Hossain,
Dobin Kim,
Jaehyun Park,
Seung-Cheol Lee,
Satadeep Bhattacharjee
Abstract:
This study investigates the thermal expansion coefficient of two-dimensional (2D) functionalized boron nitride (f-BN) materials using first-principles density functional theory (DFT). Two-dimensional materials, particularly hexagonal boron nitride (h-BN), have attracted significant attention due to their exceptional mechanical, thermal, and electronic properties. However, the influence of function…
▽ More
This study investigates the thermal expansion coefficient of two-dimensional (2D) functionalized boron nitride (f-BN) materials using first-principles density functional theory (DFT). Two-dimensional materials, particularly hexagonal boron nitride (h-BN), have attracted significant attention due to their exceptional mechanical, thermal, and electronic properties. However, the influence of functionalization on the thermal expansion behavior remains largely unexplored. In this work, DFT calculations are employed to analyze how different functionalized forms of h-BN impact the thermal expansion of BN sheets. Density functional perturbation theory (DFPT) and the quasiharmonic approximation (QAH) are utilized to determine the thermal expansion coefficient over a range of temperatures. The results reveal that functionalization induces notable modifications in the in-plane thermal expansion of BN, affecting material stability and suggesting potential applications in nanoelectronics and thermal management. This investigation provides critical insights into the tunability of the thermal properties of 2D BN, underscoring its suitability for next-generation flexible and high-performance devices.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Metrics on $C^{\ast}$-algebras of Étale groupoids from length functions
Authors:
Arnab Chattopadhyay,
Md Amir Hossain,
Soumalya Joardar
Abstract:
We show that for an étale groupoid with compact unit space, the natural Dirac type operator from a continuous length function produces a natural pseudo-metric on the state space of the corresponding reduced $C^{\ast}$-algebra. For a transformation groupoid with a continuous, proper length function with rapid decay, the state space decomposes into genuine metric spaces with a uniform finite diamete…
▽ More
We show that for an étale groupoid with compact unit space, the natural Dirac type operator from a continuous length function produces a natural pseudo-metric on the state space of the corresponding reduced $C^{\ast}$-algebra. For a transformation groupoid with a continuous, proper length function with rapid decay, the state space decomposes into genuine metric spaces with a uniform finite diameter fibred over the state space of the compact unit space. Moreover, when the unit space of the transformation groupoid has finitely many points, the metric on each fibre metrizes the weak*-topology.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Leveraging Functional Encryption and Deep Learning for Privacy-Preserving Traffic Forecasting
Authors:
Isaac Adom,
Mohammmad Iqbal Hossain,
Hassan Mahmoud,
Ahmad Alsharif,
Mahmoud Nabil Mahmoud,
Yang Xiao
Abstract:
Over the past few years, traffic congestion has continuously plagued the nation's transportation system creating several negative impacts including longer travel times, increased pollution rates, and higher collision risks. To overcome these challenges, Intelligent Transportation Systems (ITS) aim to improve mobility and vehicular systems, ensuring higher levels of safety by utilizing cutting-edge…
▽ More
Over the past few years, traffic congestion has continuously plagued the nation's transportation system creating several negative impacts including longer travel times, increased pollution rates, and higher collision risks. To overcome these challenges, Intelligent Transportation Systems (ITS) aim to improve mobility and vehicular systems, ensuring higher levels of safety by utilizing cutting-edge technologies, sophisticated sensing capabilities, and innovative algorithms. Drivers' participatory sensing, current/future location reporting, and machine learning algorithms have considerably improved real-time congestion monitoring and future traffic management. However, each driver's sensitive spatiotemporal location information can create serious privacy concerns. To address these challenges, we propose in this paper a secure, privacy-preserving location reporting and traffic forecasting system that guarantees privacy protection of driver data while maintaining high traffic forecasting accuracy. Our novel k-anonymity scheme utilizes functional encryption to aggregate encrypted location information submitted by drivers while ensuring the privacy of driver location data. Additionally, using the aggregated encrypted location information as input, this research proposes a deep learning model that incorporates a Convolutional-Long Short-Term Memory (Conv-LSTM) module to capture spatial and short-term temporal features and a Bidirectional Long Short-Term Memory (Bi-LSTM) module to recover long-term periodic patterns for traffic forecasting. With extensive evaluation on real datasets, we demonstrate the effectiveness of the proposed scheme with less than 10% mean absolute error for a 60-minute forecasting horizon, all while protecting driver privacy.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Tabular foundation model to detect empathy from visual cues
Authors:
Md Rakibul Hasan,
Shafin Rahman,
Md Zakir Hossain,
Aneesh Krishna,
Tom Gedeon
Abstract:
Detecting empathy from video interactions is an emerging area of research. Video datasets, however, are often released as extracted features (i.e., tabular data) rather than raw footage due to privacy and ethical concerns. Prior research on such tabular datasets established tree-based classical machine learning approaches as the best-performing models. Motivated by the recent success of textual fo…
▽ More
Detecting empathy from video interactions is an emerging area of research. Video datasets, however, are often released as extracted features (i.e., tabular data) rather than raw footage due to privacy and ethical concerns. Prior research on such tabular datasets established tree-based classical machine learning approaches as the best-performing models. Motivated by the recent success of textual foundation models (i.e., large language models), we explore the use of tabular foundation models in empathy detection from tabular visual features. We experiment with two recent tabular foundation models $-$ TabPFN v2 and TabICL $-$ through in-context learning and fine-tuning setups. Our experiments on a public human-robot interaction benchmark demonstrate a significant boost in cross-subject empathy detection accuracy over several strong baselines (accuracy: $0.590 \rightarrow 0.730$; AUC: $0.564 \rightarrow 0.669$). In addition to performance improvement, we contribute novel insights and an evaluation setup to ensure generalisation on unseen subjects in this public benchmark. As the practice of releasing video features as tabular datasets is likely to persist due to privacy constraints, our findings will be widely applicable to future empathy detection video datasets as well.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Ultrafast dynamics of ferroelectric polarization of NbOI$_{2}$ captured with femtosecond electron diffraction
Authors:
Yibo Wang,
Md Sazzad Hossain,
Tianlin Li,
Yanwei Xiong,
Cuong Le,
Jesse Kuebler,
Nina Raghavan,
Lucia Fernandez-Ballester,
Xia Hong,
Alexander Sinitskii,
Martin Centurion
Abstract:
Two-dimensional (2D) ferroelectric materials like NbOI$_{2}$ have garnered significant interest, yet their temporal response and synergetic interaction with light remain underexplored. Previous studies on the polarization of oxide ferroelectrics have relied on time-resolved optical second harmonic generation or ultrafast X-ray scattering. Here, we probe the laser-induced polarization dynamics of 2…
▽ More
Two-dimensional (2D) ferroelectric materials like NbOI$_{2}$ have garnered significant interest, yet their temporal response and synergetic interaction with light remain underexplored. Previous studies on the polarization of oxide ferroelectrics have relied on time-resolved optical second harmonic generation or ultrafast X-ray scattering. Here, we probe the laser-induced polarization dynamics of 2D NbOI$_{2}$ nanocrystals using ultrafast transmission electron diffraction and deflectometry. The deflection of the electron pulses is directly sensitive to the changes in the polarization, while the diffraction signal captures the structural evolution. Excited with a UV laser pulse, the polarization of NbOI$_{2}$ is initially suppressed for two picoseconds, then it recovers and overshoots, leading to a transiently enhanced polarization persisting for over 200 ps. This recovery coincides with coherent acoustic phonon generation, triggering a piezoresponse in the NbOI$_{2}$ nanocrystals. Our results offer a new method for sensing the ferroelectric order parameter in femtosecond time scales.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
A Cascaded Architecture for Extractive Summarization of Multimedia Content via Audio-to-Text Alignment
Authors:
Tanzir Hossain,
Ar-Rafi Islam,
Md. Sabbir Hossain,
Annajiat Alim Rasel
Abstract:
This study presents a cascaded architecture for extractive summarization of multimedia content via audio-to-text alignment. The proposed framework addresses the challenge of extracting key insights from multimedia sources like YouTube videos. It integrates audio-to-text conversion using Microsoft Azure Speech with advanced extractive summarization models, including Whisper, Pegasus, and Facebook B…
▽ More
This study presents a cascaded architecture for extractive summarization of multimedia content via audio-to-text alignment. The proposed framework addresses the challenge of extracting key insights from multimedia sources like YouTube videos. It integrates audio-to-text conversion using Microsoft Azure Speech with advanced extractive summarization models, including Whisper, Pegasus, and Facebook BART XSum. The system employs tools such as Pytube, Pydub, and SpeechRecognition for content retrieval, audio extraction, and transcription. Linguistic analysis is enhanced through named entity recognition and semantic role labeling. Evaluation using ROUGE and F1 scores demonstrates that the cascaded architecture outperforms conventional summarization methods, despite challenges like transcription errors. Future improvements may include model fine-tuning and real-time processing. This study contributes to multimedia summarization by improving information retrieval, accessibility, and user experience.
△ Less
Submitted 6 March, 2025;
originally announced April 2025.
-
Enhancing Trust in AI Marketplaces: Evaluating On-Chain Verification of Personalized AI models using zk-SNARKs
Authors:
Nishant Jagannath,
Christopher Wong,
Braden Mcgrath,
Md Farhad Hossain,
Asuquo A. Okon,
Abbas Jamalipour,
Kumudu S. Munasinghe
Abstract:
The rapid advancement of artificial intelligence (AI) has brought about sophisticated models capable of various tasks ranging from image recognition to natural language processing. As these models continue to grow in complexity, ensuring their trustworthiness and transparency becomes critical, particularly in decentralized environments where traditional trust mechanisms are absent. This paper addr…
▽ More
The rapid advancement of artificial intelligence (AI) has brought about sophisticated models capable of various tasks ranging from image recognition to natural language processing. As these models continue to grow in complexity, ensuring their trustworthiness and transparency becomes critical, particularly in decentralized environments where traditional trust mechanisms are absent. This paper addresses the challenge of verifying personalized AI models in such environments, focusing on their integrity and privacy. We propose a novel framework that integrates zero-knowledge succinct non-interactive arguments of knowledge (zk-SNARKs) with Chainlink decentralized oracles to verify AI model performance claims on blockchain platforms. Our key contribution lies in integrating zk-SNARKs with Chainlink oracles to securely fetch and verify external data to enable trustless verification of AI models on a blockchain. Our approach addresses the limitations of using unverified external data for AI verification on the blockchain while preserving sensitive information of AI models and enhancing transparency. We demonstrate our methodology with a linear regression model predicting Bitcoin prices using on-chain data verified on the Sepolia testnet. Our results indicate the framework's efficacy, with key metrics including proof generation taking an average of 233.63 seconds and verification time of 61.50 seconds. This research paves the way for transparent and trustless verification processes in blockchain-enabled AI ecosystems, addressing key challenges such as model integrity and model privacy protection. The proposed framework, while exemplified with linear regression, is designed for broader applicability across more complex AI models, setting the stage for future advancements in transparent AI verification.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Classification of ADHD and Healthy Children Using EEG Based Multi-Band Spatial Features Enhancement
Authors:
Md Bayazid Hossain,
Md Anwarul Islam Himel,
Md Abdur Rahim,
Shabbir Mahmood,
Abu Saleh Musa Miah,
Jungpil Shin
Abstract:
Attention Deficit Hyperactivity Disorder (ADHD) is a common neurodevelopmental disorder in children, characterized by difficulties in attention, hyperactivity, and impulsivity. Early and accurate diagnosis of ADHD is critical for effective intervention and management. Electroencephalogram (EEG) signals have emerged as a non-invasive and efficient tool for ADHD detection due to their high temporal…
▽ More
Attention Deficit Hyperactivity Disorder (ADHD) is a common neurodevelopmental disorder in children, characterized by difficulties in attention, hyperactivity, and impulsivity. Early and accurate diagnosis of ADHD is critical for effective intervention and management. Electroencephalogram (EEG) signals have emerged as a non-invasive and efficient tool for ADHD detection due to their high temporal resolution and ability to capture neural dynamics. In this study, we propose a method for classifying ADHD and healthy children using EEG data from the benchmark dataset. There were 61 children with ADHD and 60 healthy children, both boys and girls, aged 7 to 12. The EEG signals, recorded from 19 channels, were processed to extract Power Spectral Density (PSD) and Spectral Entropy (SE) features across five frequency bands, resulting in a comprehensive 190-dimensional feature set. To evaluate the classification performance, a Support Vector Machine (SVM) with the RBF kernel demonstrated the best performance with a mean cross-validation accuracy of 99.2\% and a standard deviation of 0.0079, indicating high robustness and precision. These results highlight the potential of spatial features in conjunction with machine learning for accurately classifying ADHD using EEG data. This work contributes to developing non-invasive, data-driven tools for early diagnosis and assessment of ADHD in children.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
AutoPsyC: Automatic Recognition of Psychodynamic Conflicts from Semi-structured Interviews with Large Language Models
Authors:
Sayed Muddashir Hossain,
Simon Ostermann,
Patrick Gebhard,
Cord Benecke,
Josef van Genabith,
Philipp Müller
Abstract:
Psychodynamic conflicts are persistent, often unconscious themes that shape a person's behaviour and experiences. Accurate diagnosis of psychodynamic conflicts is crucial for effective patient treatment and is commonly done via long, manually scored semi-structured interviews. Existing automated solutions for psychiatric diagnosis tend to focus on the recognition of broad disorder categories such…
▽ More
Psychodynamic conflicts are persistent, often unconscious themes that shape a person's behaviour and experiences. Accurate diagnosis of psychodynamic conflicts is crucial for effective patient treatment and is commonly done via long, manually scored semi-structured interviews. Existing automated solutions for psychiatric diagnosis tend to focus on the recognition of broad disorder categories such as depression, and it is unclear to what extent psychodynamic conflicts which even the patient themselves may not have conscious access to could be automatically recognised from conversation. In this paper, we propose AutoPsyC, the first method for recognising the presence and significance of psychodynamic conflicts from full-length Operationalized Psychodynamic Diagnostics (OPD) interviews using Large Language Models (LLMs). Our approach combines recent advances in parameter-efficient fine-tuning and Retrieval-Augmented Generation (RAG) with a summarisation strategy to effectively process entire 90 minute long conversations. In evaluations on a dataset of 141 diagnostic interviews we show that AutoPsyC consistently outperforms all baselines and ablation conditions on the recognition of four highly relevant psychodynamic conflicts.
△ Less
Submitted 27 March, 2025;
originally announced March 2025.
-
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data
Authors:
Md. Shaheenur Islam Sumon,
Md. Sakib Bin Islam,
Md. Sohanur Rahman,
Md. Sakib Abrar Hossain,
Amith Khandakar,
Anwarul Hasan,
M Murugappan,
Muhammad E. H. Chowdhury
Abstract:
The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within bi…
▽ More
The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within biological systems remain ambiguous in these spaces, highlighting the necessity of early detection and prediction of cardiovascular diseases to reduce the severe morbidity and mortality with these conditions worldwide. We handle this issue with CardioTabNet, which exploits the strength of tab transformer to extract feature space which carries strong understanding of clinical cardiovascular data and its feature ranking. As a result, performance of downstream classical models significantly showed outstanding result. Our study utilizes the open-source dataset for heart disease prediction with 1190 instances and 11 features. In total, 11 features are divided into numerical (age, resting blood pressure, cholesterol, maximum heart rate, old peak, weight, and fasting blood sugar) and categorical (resting ECG, exercise angina, and ST slope). Tab transformer was used to extract important features and ranked them using random forest (RF) feature ranking algorithm. Ten machine-learning models were used to predict heart disease using selected features. After extracting high-quality features, the top downstream model (a hyper-tuned ExtraTree classifier) achieved an average accuracy rate of 94.1% and an average Area Under Curve (AUC) of 95.0%. Furthermore, a nomogram analysis was conducted to evaluate the model's effectiveness in cardiovascular risk assessment. A benchmarking study was conducted using state-of-the-art models to evaluate our transformer-driven framework.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
MobilePlantViT: A Mobile-friendly Hybrid ViT for Generalized Plant Disease Image Classification
Authors:
Moshiur Rahman Tonmoy,
Md. Mithun Hossain,
Nilanjan Dey,
M. F. Mridha
Abstract:
Plant diseases significantly threaten global food security by reducing crop yields and undermining agricultural sustainability. AI-driven automated classification has emerged as a promising solution, with deep learning models demonstrating impressive performance in plant disease identification. However, deploying these models on mobile and edge devices remains challenging due to high computational…
▽ More
Plant diseases significantly threaten global food security by reducing crop yields and undermining agricultural sustainability. AI-driven automated classification has emerged as a promising solution, with deep learning models demonstrating impressive performance in plant disease identification. However, deploying these models on mobile and edge devices remains challenging due to high computational demands and resource constraints, highlighting the need for lightweight, accurate solutions for accessible smart agriculture systems. To address this, we propose MobilePlantViT, a novel hybrid Vision Transformer (ViT) architecture designed for generalized plant disease classification, which optimizes resource efficiency while maintaining high performance. Extensive experiments across diverse plant disease datasets of varying scales show our model's effectiveness and strong generalizability, achieving test accuracies ranging from 80% to over 99%. Notably, with only 0.69 million parameters, our architecture outperforms the smallest versions of MobileViTv1 and MobileViTv2, despite their higher parameter counts. These results underscore the potential of our approach for real-world, AI-powered automated plant disease classification in sustainable and resource-efficient smart agriculture systems. All codes will be available in the GitHub repository: https://github.com/moshiurtonmoy/MobilePlantViT
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions
Authors:
Hadi Amini,
Md Jueal Mia,
Yasaman Saadati,
Ahmed Imteaj,
Seyedsina Nabavirazavi,
Urmish Thakker,
Md Zarif Hossain,
Awal Ahmed Fime,
S. S. Iyengar
Abstract:
Language models (LMs) are machine learning models designed to predict linguistic patterns by estimating the probability of word sequences based on large-scale datasets, such as text. LMs have a wide range of applications in natural language processing (NLP) tasks, including autocomplete and machine translation. Although larger datasets typically enhance LM performance, scalability remains a challe…
▽ More
Language models (LMs) are machine learning models designed to predict linguistic patterns by estimating the probability of word sequences based on large-scale datasets, such as text. LMs have a wide range of applications in natural language processing (NLP) tasks, including autocomplete and machine translation. Although larger datasets typically enhance LM performance, scalability remains a challenge due to constraints in computational power and resources. Distributed computing strategies offer essential solutions for improving scalability and managing the growing computational demand. Further, the use of sensitive datasets in training and deployment raises significant privacy concerns. Recent research has focused on developing decentralized techniques to enable distributed training and inference while utilizing diverse computational resources and enabling edge AI. This paper presents a survey on distributed solutions for various LMs, including large language models (LLMs), vision language models (VLMs), multimodal LLMs (MLLMs), and small language models (SLMs). While LLMs focus on processing and generating text, MLLMs are designed to handle multiple modalities of data (e.g., text, images, and audio) and to integrate them for broader applications. To this end, this paper reviews key advancements across the MLLM pipeline, including distributed training, inference, fine-tuning, and deployment, while also identifying the contributions, limitations, and future areas of improvement. Further, it categorizes the literature based on six primary focus areas of decentralization. Our analysis describes gaps in current methodologies for enabling distributed solutions for LMs and outline future research directions, emphasizing the need for novel solutions to enhance the robustness and applicability of distributed LMs.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Gravitational lensing due to charged galactic wormhole
Authors:
Md Khalid Hossain,
Farook Rahaman
Abstract:
We propose the back reaction to the charged galactic wormhole spacetime based on Yoshiaki Sofue's exponential dark matter density profile to find exact solutions. The charges act as an additional component to the static wormhole, which is primarily formed by the galactic dark matter density. Unlike traditional mass-based models, this solution incorporates charge effects within a realistic dark mat…
▽ More
We propose the back reaction to the charged galactic wormhole spacetime based on Yoshiaki Sofue's exponential dark matter density profile to find exact solutions. The charges act as an additional component to the static wormhole, which is primarily formed by the galactic dark matter density. Unlike traditional mass-based models, this solution incorporates charge effects within a realistic dark matter distribution, revealing unique interactions between dark matter, electromagnetic fields, and spacetime curvature. This study confirms the criteria for wormhole formation, designating it the "Charged Galactic Wormhole," and offers a new framework for investigating galactic structures, with potential observational signatures that deepen our understanding of dark matter and spacetime. Later, the proper radial distance and the embedding surface were also analyzed. Furthermore, the deflection of light around a charged galactic wormhole was investigated, along with a comprehensive review of the resulting image. The deflection of massive objects (charge less) near charged galactic wormholes is studied using the Gauss-Bonnet and Rindler-Ishak methods, with a detailed comparison of the results from both approaches. Additionally, in both the Rindler-Ishak (RI) and Gauss-Bonnet (GB) methods, when v tends to 1 i.e. when particle's velocity comparable to the speed of light , the results from these approaches converge, producing the same outcome as strong gravitational lensing.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Observation of thermally activated coherent magnon-magnon coupling in a magnonic hybrid system
Authors:
Dinesh Wagle,
Yi Li,
Mojtaba Taghipour Kaffash,
Sergi Lendinez,
Mohammad Tomal Hossain,
Valentine Novosad,
M. Benjamin Jungfleisch
Abstract:
We experimentally demonstrate strong magnon-magnon coupling by thermal spin excitations in yttrium iron garnet/permalloy (YIG/Py) hybrid structures using microfocused Brillouin light scattering - an optical technique that enables the detection of zero-wavevector and higher-order wavevector spin waves in a broad frequency range. The thermally activated magnons in the bilayer lead to a hybrid excita…
▽ More
We experimentally demonstrate strong magnon-magnon coupling by thermal spin excitations in yttrium iron garnet/permalloy (YIG/Py) hybrid structures using microfocused Brillouin light scattering - an optical technique that enables the detection of zero-wavevector and higher-order wavevector spin waves in a broad frequency range. The thermally activated magnons in the bilayer lead to a hybrid excitation between magnon modes in the conductive Py layer with a wide wavevector range and the first perpendicular standing magnon modes in the insulating YIG layer, facilitated by strong interfacial exchange coupling. To further investigate this coupling, we compare the thermal magnon spectra with the results obtained from electrical excitation and detection methods, which primarily detect the uniform Py mode. The realization of coherent coupling between incoherent (thermal) magnons is important for advancing energy-efficient magnonic devices, particularly in classical as well as quantum spin-wave computing technologies.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Deep Neural Network-Based Voltage Prediction for Alkali-Metal-Ion Battery Materials
Authors:
Sk Mujaffar Hossain,
Namitha Anna Koshi,
Seung-Cheol Lee,
G. P Das,
Satadeep Bhattacharjee
Abstract:
Accurate voltage prediction of battery materials plays a pivotal role in advancing energy storage technologies and in the rational design of high-performance cathode materials. In this work, we present a deep neural network (DNN) model, built using PyTorch, to estimate the average voltage of cathode materials across Li-ion, Na-ion, and other alkali-metal-ion batteries. The model is trained on an e…
▽ More
Accurate voltage prediction of battery materials plays a pivotal role in advancing energy storage technologies and in the rational design of high-performance cathode materials. In this work, we present a deep neural network (DNN) model, built using PyTorch, to estimate the average voltage of cathode materials across Li-ion, Na-ion, and other alkali-metal-ion batteries. The model is trained on an extensive dataset from the Materials Project, incorporating a wide range of descriptors-structural, physical, chemical, electronic, thermodynamic, and battery-specific-ensuring a comprehensive representation of material properties. Our model exhibits strong predictive performance, as corroborated by first-principles density functional theory (DFT) calculations. The close alignment between the DNN predictions and DFT outcomes highlights the robustness and accuracy of our machine learning framework in effectively screening and identifying viable battery materials. Utilizing this validated model, we successfully propose novel Na-ion battery compositions, with their predicted behavior confirmed through rigorous computational assessment. By seamlessly integrating data-driven prediction with first-principles validation, this study presents an effective framework that significantly accelerates the discovery and optimization of advanced battery materials, contributing to the development of more reliable and efficient energy storage technologies.
△ Less
Submitted 3 April, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Authors:
Mir Rayat Imtiaz Hossain,
Mennatullah Siam,
Leonid Sigal,
James J. Little
Abstract:
Large-scale vision-language models (VLMs), trained on extensive datasets of image-text pairs, exhibit strong multimodal understanding capabilities by implicitly learning associations between textual descriptions and image regions. This emergent ability enables zero-shot object detection and segmentation, using techniques that rely on text-image attention maps, without necessarily training on abund…
▽ More
Large-scale vision-language models (VLMs), trained on extensive datasets of image-text pairs, exhibit strong multimodal understanding capabilities by implicitly learning associations between textual descriptions and image regions. This emergent ability enables zero-shot object detection and segmentation, using techniques that rely on text-image attention maps, without necessarily training on abundant labeled segmentation datasets. However, performance of such methods depends heavily on prompt engineering and manually selected layers or head choices for the attention layers. In this work, we demonstrate that, rather than relying solely on textual prompts, providing a single visual example for each category and fine-tuning the text-to-image attention layers and embeddings significantly improves the performance. Additionally, we propose learning an ensemble through few-shot fine-tuning across multiple layers and/or prompts. An entropy-based ranking and selection mechanism for text-to-image attention layers is proposed to identify the top-performing layers without the need for segmentation labels. This eliminates the need for hyper-parameter selection of text-to-image attention layers, providing a more flexible and scalable solution for open-vocabulary segmentation. We show that this approach yields strong zero-shot performance, further enhanced through fine-tuning with a single visual example. Moreover, we demonstrate that our method and findings are general and can be applied across various vision-language models (VLMs).
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Cross-platform Prediction of Depression Treatment Outcome Using Location Sensory Data on Smartphones
Authors:
Soumyashree Sahoo,
Chinmaey Shende,
Md. Zakir Hossain,
Parit Patel,
Yushuo Niu,
Xinyu Wang,
Shweta Ware,
Jinbo Bi,
Jayesh Kamath,
Alexander Russel,
Dongjin Song,
Qian Yang,
Bing Wang
Abstract:
Currently, depression treatment relies on closely monitoring patients response to treatment and adjusting the treatment as needed. Using self-reported or physician-administrated questionnaires to monitor treatment response is, however, burdensome, costly and suffers from recall bias. In this paper, we explore using location sensory data collected passively on smartphones to predict treatment outco…
▽ More
Currently, depression treatment relies on closely monitoring patients response to treatment and adjusting the treatment as needed. Using self-reported or physician-administrated questionnaires to monitor treatment response is, however, burdensome, costly and suffers from recall bias. In this paper, we explore using location sensory data collected passively on smartphones to predict treatment outcome. To address heterogeneous data collection on Android and iOS phones, the two predominant smartphone platforms, we explore using domain adaptation techniques to map their data to a common feature space, and then use the data jointly to train machine learning models. Our results show that this domain adaptation approach can lead to significantly better prediction than that with no domain adaptation. In addition, our results show that using location features and baseline self-reported questionnaire score can lead to F1 score up to 0.67, comparable to that obtained using periodic self-reported questionnaires, indicating that using location data is a promising direction for predicting depression treatment outcome.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Blockwise Post-processing in Satellite-based Quantum Key Distribution
Authors:
Minu J. Bae,
Nitish K. Panigrahy,
Prajit Dhara,
Md Zakir Hossain,
Walter O. Krawec,
Alexander Russell,
Don Towsley,
Bing Wang
Abstract:
Free-space satellite communication has significantly lower photon loss than terrestrial communication via optical fibers. Satellite-based quantum key distribution (QKD) leverages this advantage and provides a promising direction in achieving long-distance QKD. While the technological feasibility of satellite-based QKD has been demonstrated experimentally, optimizing the key rate remains a signific…
▽ More
Free-space satellite communication has significantly lower photon loss than terrestrial communication via optical fibers. Satellite-based quantum key distribution (QKD) leverages this advantage and provides a promising direction in achieving long-distance QKD. While the technological feasibility of satellite-based QKD has been demonstrated experimentally, optimizing the key rate remains a significant challenge. In this paper, we argue that improving classical post-processing is an important direction in increasing key rate in satellite-based QKD, while it can also be easily incorporated in existing satellite systems. In particular, we explore one direction, blockwise post-processing, to address highly dynamic satellite channel conditions due to various environmental factors. This blockwise strategy divides the raw key bits into individual blocks that have similar noise characteristics, and processes them independently, in contrast to traditional non-blockwise strategy that treats all the raw key bits as a whole. Using a case study, we discuss the choice of blocks in blockwise strategy, and show that blockwise strategy can significantly outperform non-blockwise strategy. Our study demonstrates the importance of post-processing in satellite QKD systems, and presents several open problems in this direction.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Design of a Microprocessors and Microcontrollers Laboratory Course Addressing Complex Engineering Problems and Activities
Authors:
Fahim Hafiz,
Md Jahidul Hoq Emon,
Md Abid Hossain,
Md. Saddam Hossain Mukta,
Salekul Islam,
Swakkhar Shatabda
Abstract:
This paper proposes a novel curriculum for the microprocessors and microcontrollers laboratory course. The proposed curriculum blends structured laboratory experiments with an open-ended project phase, addressing complex engineering problems and activities. Microprocessors and microcontrollers are ubiquitous in modern technology, driving applications across diverse fields. To prepare future engine…
▽ More
This paper proposes a novel curriculum for the microprocessors and microcontrollers laboratory course. The proposed curriculum blends structured laboratory experiments with an open-ended project phase, addressing complex engineering problems and activities. Microprocessors and microcontrollers are ubiquitous in modern technology, driving applications across diverse fields. To prepare future engineers for Industry 4.0, effective educational approaches are crucial. The proposed lab enables students to perform hands-on experiments using advanced microprocessors and microcontrollers while leveraging their acquired knowledge by working in teams to tackle self-defined complex engineering problems that utilize these devices and sensors, often used in the industry. Furthermore, this curriculum fosters multidisciplinary learning and equips students with problem-solving skills that can be applied in real-world scenarios. With recent technological advancements, traditional microprocessors and microcontrollers curricula often fail to capture the complexity of real-world applications. This curriculum addresses this critical gap by incorporating insights from experts in both industry and academia. It trains students with the necessary skills and knowledge to thrive in this rapidly evolving technological landscape, preparing them for success upon graduation. The curriculum integrates project-based learning, where students define complex engineering problems for themselves. This approach actively engages students, fostering a deeper understanding and enhancing their learning capabilities. Statistical analysis shows that the proposed curriculum significantly improves student learning outcomes, particularly in their ability to formulate and solve complex engineering problems, as well as engage in complex engineering activities.
△ Less
Submitted 19 February, 2025;
originally announced March 2025.