Search | arXiv e-print repository

A Structured Bangla Dataset of Disease-Symptom Associations to Improve Diagnostic Accuracy

Authors: Abdullah Al Shafi, Rowzatul Zannat, Abdul Muntakim, Mahmudul Hasan

Abstract: Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications. These datasets help identify symptom patterns associated with specific diseases, thus improving diagnostic accuracy and enabling early detection. The dataset presented in this study systematically compiles disease-symptom relationsh… ▽ More Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications. These datasets help identify symptom patterns associated with specific diseases, thus improving diagnostic accuracy and enabling early detection. The dataset presented in this study systematically compiles disease-symptom relationships from various online sources, medical literature, and publicly available health databases. The data was gathered through analyzing peer-reviewed medical articles, clinical case studies, and disease-symptom association reports. Only the verified medical sources were included in the dataset, while those from non-peer-reviewed and anecdotal sources were excluded. The dataset is structured in a tabular format, where the first column represents diseases, and the remaining columns represent symptoms. Each symptom cell contains a binary value (1 or 0), indicating whether a symptom is associated with a disease (1 for presence, 0 for absence). Thereby, this structured representation makes the dataset very useful for a wide range of applications, including machine learning-based disease prediction, clinical decision support systems, and epidemiological studies. Although there are some advancements in the field of disease-symptom datasets, there is a significant gap in structured datasets for the Bangla language. This dataset aims to bridge that gap by facilitating the development of multilingual medical informatics tools and improving disease prediction models for underrepresented linguistic communities. Further developments should include region-specific diseases and further fine-tuning of symptom associations for better diagnostic performance △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: Preprint

arXiv:2506.13538 [pdf, ps, other]

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

Authors: Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Gopi Krishnan Rajbahadur, Bram Adams, Ahmed E. Hassan

Abstract: Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standa… ▽ More Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced tool calling-triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem, which has become the de facto standard with over eight million weekly SDK downloads. Despite its adoption, MCP's AI-driven, non-deterministic control flow introduces new risks to sustainability, security, and maintainability, warranting closer examination. Towards this end, we present the first large-scale empirical study of MCP servers. Using state-of-the-art health metrics and a hybrid analysis pipeline, combining a general-purpose static analysis tool with an MCP-specific scanner, we evaluate 1,899 open-source MCP servers to assess their health, security, and maintainability. Despite MCP servers demonstrating strong health metrics, we identify eight distinct vulnerabilities -- only three overlapping with traditional software vulnerabilities. Additionally, 7.2% of servers contain general vulnerabilities and 5.5% exhibit MCP-specific tool poisoning. Regarding maintainability, while 66% exhibit code smells, 14.4% contain ten bug patterns overlapping with traditional open-source software projects. These findings highlight the need for MCP-specific vulnerability detection techniques while reaffirming the value of traditional analysis and refactoring practices. △ Less

Submitted 18 June, 2025; v1 submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.12561 [pdf, ps, other]

Parkinson's Disease Freezing of Gait (FoG) Symptom Detection Using Machine Learning from Wearable Sensor Data

Authors: Mahmudul Hasan

Abstract: Freezing of gait (FoG) is a special symptom found in patients with Parkinson's disease (PD). Patients who have FoG abruptly lose the capacity to walk as they normally would. Accelerometers worn by patients can record movement data during these episodes, and machine learning algorithms can be useful to categorize this information. Thus, the combination may be able to identify FoG in real time. In o… ▽ More Freezing of gait (FoG) is a special symptom found in patients with Parkinson's disease (PD). Patients who have FoG abruptly lose the capacity to walk as they normally would. Accelerometers worn by patients can record movement data during these episodes, and machine learning algorithms can be useful to categorize this information. Thus, the combination may be able to identify FoG in real time. In order to identify FoG events in accelerometer data, we introduce the Transformer Encoder-Bi-LSTM fusion model in this paper. The model's capability to differentiate between FoG episodes and normal movement was used to evaluate its performance, and on the Kaggle Parkinson's Freezing of Gait dataset, the proposed Transformer Encoder-Bi-LSTM fusion model produced 92.6% accuracy, 80.9% F1 score, and 52.06% in terms of mean average precision. The findings highlight how Deep Learning-based approaches may progress the field of FoG identification and help PD patients receive better treatments and management plans. △ Less

Submitted 14 June, 2025; originally announced June 2025.

arXiv:2506.12404 [pdf, ps, other]

EXGnet: a single-lead explainable-AI guided multiresolution network with train-only quantitative features for trustworthy ECG arrhythmia classification

Authors: Tushar Talukder Showrav, Soyabul Islam Lincoln, Md. Kamrul Hasan

Abstract: Background: Deep learning has significantly advanced ECG arrhythmia classification, enabling high accuracy in detecting various cardiac conditions. The use of single-lead ECG systems is crucial for portable devices, as they offer convenience and accessibility for continuous monitoring in diverse settings. However, the interpretability and reliability of deep learning models in clinical application… ▽ More Background: Deep learning has significantly advanced ECG arrhythmia classification, enabling high accuracy in detecting various cardiac conditions. The use of single-lead ECG systems is crucial for portable devices, as they offer convenience and accessibility for continuous monitoring in diverse settings. However, the interpretability and reliability of deep learning models in clinical applications poses challenges due to their black-box nature. Methods: To address these challenges, we propose EXGnet, a single-lead, trustworthy ECG arrhythmia classification network that integrates multiresolution feature extraction with Explainable Artificial Intelligence (XAI) guidance and train only quantitative features. Results: Trained on two public datasets, including Chapman and Ningbo, EXGnet demonstrates superior performance through key metrics such as Accuracy, F1-score, Sensitivity, and Specificity. The proposed method achieved average five fold accuracy of 98.762%, and 96.932% and average F1-score of 97.910%, and 95.527% on the Chapman and Ningbo datasets, respectively. Conclusions: By employing XAI techniques, specifically Grad-CAM, the model provides visual insights into the relevant ECG segments it analyzes, thereby enhancing clinician trust in its predictions. While quantitative features further improve classification performance, they are not required during testing, making the model suitable for real-world applications. Overall, EXGnet not only achieves better classification accuracy but also addresses the critical need for interpretability in deep learning, facilitating broader adoption in portable ECG monitoring. △ Less

Submitted 14 June, 2025; originally announced June 2025.

Comments: 21 pages, 3 figures

arXiv:2506.10154 [pdf, ps, other]

Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME

Authors: Bidyarthi Paul, SM Musfiqur Rahman, Dipta Biswas, Md. Ziaul Hasan, Md. Zahid Hossain

Abstract: Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models: Linear SVM, KNN, and Random Forest with n-gram data f… ▽ More Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models: Linear SVM, KNN, and Random Forest with n-gram data from a TF-IDF vectorizer. We additionally investigated how PCA affects the reduction of dimensionality. Moreover, we utilized a BiLSTM model and AdaBoost to improve decision trees. To make our machine learning models easier to understand, we used LIME to explain the predictions of the AdaBoost classifier, which uses decision trees. With the goal of advancing sentiment analysis in languages with limited resources, our work examines various techniques to find efficient techniques for emotion identification in Bangla. △ Less

Submitted 11 June, 2025; originally announced June 2025.

arXiv:2506.04585 [pdf, other]

MuSciClaims: Multimodal Scientific Claim Verification

Authors: Yash Kumar Lal, Manikanta Bandham, Mohammad Saqib Hasan, Apoorva Kashi, Mahnaz Koupaee, Niranjan Balasubramanian

Abstract: Assessing scientific claims requires identifying, extracting, and reasoning with multimodal data expressed in information-rich figures in scientific literature. Despite the large body of work in scientific QA, figure captioning, and other multimodal reasoning tasks over chart-based data, there are no readily usable multimodal benchmarks that directly test claim verification abilities. To remedy th… ▽ More Assessing scientific claims requires identifying, extracting, and reasoning with multimodal data expressed in information-rich figures in scientific literature. Despite the large body of work in scientific QA, figure captioning, and other multimodal reasoning tasks over chart-based data, there are no readily usable multimodal benchmarks that directly test claim verification abilities. To remedy this gap, we introduce a new benchmark MuSciClaims accompanied by diagnostics tasks. We automatically extract supported claims from scientific articles, which we manually perturb to produce contradicted claims. The perturbations are designed to test for a specific set of claim verification capabilities. We also introduce a suite of diagnostic tasks that help understand model failures. Our results show most vision-language models are poor (~0.3-0.5 F1), with even the best model only achieving 0.77 F1. They are also biased towards judging claims as supported, likely misunderstanding nuanced perturbations within the claims. Our diagnostics show models are bad at localizing correct evidence within figures, struggle with aggregating information across modalities, and often fail to understand basic components of the figure. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2506.03870 [pdf, ps, other]

Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets

Authors: Mohd. Farhan Israk Soumik, Syed Mhamudul Hasan, Abdur R. Shahid

Abstract: The misuse of Large Language Models (LLMs) to infer emotions from text for malicious purposes, known as emotion inference attacks, poses a significant threat to user privacy. In this paper, we investigate the potential of Apple Intelligence's writing tools, integrated across iPhone, iPad, and MacBook, to mitigate these risks through text modifications such as rewriting and tone adjustment. By deve… ▽ More The misuse of Large Language Models (LLMs) to infer emotions from text for malicious purposes, known as emotion inference attacks, poses a significant threat to user privacy. In this paper, we investigate the potential of Apple Intelligence's writing tools, integrated across iPhone, iPad, and MacBook, to mitigate these risks through text modifications such as rewriting and tone adjustment. By developing early novel datasets specifically for this purpose, we empirically assess how different text modifications influence LLM-based detection. This capability suggests strong potential for Apple Intelligence's writing tools as privacy-preserving mechanisms. Our findings lay the groundwork for future adaptive rewriting systems capable of dynamically neutralizing sensitive emotional content to enhance user privacy. To the best of our knowledge, this research provides the first empirical analysis of Apple Intelligence's text-modification tools within a privacy-preservation context with the broader goal of developing on-device, user-centric privacy-preserving mechanisms to protect against LLMs-based advanced inference attacks on deployed systems. △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2506.03420 [pdf, ps, other]

Hybrid Ensemble of Segmentation-Assisted Classification and GBDT for Skin Cancer Detection with Engineered Metadata and Synthetic Lesions from ISIC 2024 Non-Dermoscopic 3D-TBP Images

Authors: Muhammad Zubair Hasan, Fahmida Yasmin Rifat

Abstract: Skin cancer is among the most prevalent and life-threatening diseases worldwide, with early detection being critical to patient outcomes. This work presents a hybrid machine and deep learning-based approach for classifying malignant and benign skin lesions using the SLICE-3D dataset from ISIC 2024, which comprises 401,059 cropped lesion images extracted from 3D Total Body Photography (TBP), emulat… ▽ More Skin cancer is among the most prevalent and life-threatening diseases worldwide, with early detection being critical to patient outcomes. This work presents a hybrid machine and deep learning-based approach for classifying malignant and benign skin lesions using the SLICE-3D dataset from ISIC 2024, which comprises 401,059 cropped lesion images extracted from 3D Total Body Photography (TBP), emulating non-dermoscopic, smartphone-like conditions. Our method combines vision transformers (EVA02) and our designed convolutional ViT hybrid (EdgeNeXtSAC) to extract robust features, employing a segmentation-assisted classification pipeline to enhance lesion localization. Predictions from these models are fused with a gradient-boosted decision tree (GBDT) ensemble enriched by engineered features and patient-specific relational metrics. To address class imbalance and improve generalization, we augment malignant cases with Stable Diffusion-generated synthetic lesions and apply a diagnosis-informed relabeling strategy to harmonize external datasets into a 3-class format. Using partial AUC (pAUC) above 80 percent true positive rate (TPR) as the evaluation metric, our approach achieves a pAUC of 0.1755 -- the highest among all configurations. These results underscore the potential of hybrid, interpretable AI systems for skin cancer triage in telemedicine and resource-constrained settings. △ Less

Submitted 3 June, 2025; originally announced June 2025.

Comments: Written as per the requirements of CVPR 2025. It is a 8 page paper without reference

arXiv:2506.01991 [pdf, ps, other]

Investigating Timing-Based Information Leakage in Data Flow-Driven Real-Time Systems

Authors: Mohammad Fakhruddin Babar, Zain A. H. Hammadeh, Mohammad Hamad, Monowar Hasan

Abstract: Leaking information about the execution behavior of critical real-time tasks may lead to serious consequences, including violations of temporal constraints and even severe failures. We study information leakage for a special class of real-time tasks that have two execution modes, namely, typical execution (which invokes the majority of times) and critical execution (to tackle exceptional condition… ▽ More Leaking information about the execution behavior of critical real-time tasks may lead to serious consequences, including violations of temporal constraints and even severe failures. We study information leakage for a special class of real-time tasks that have two execution modes, namely, typical execution (which invokes the majority of times) and critical execution (to tackle exceptional conditions). The data flow-driven applications inherit such a multimode execution model. In this paper, we investigate whether a low-priority "observer" task can infer the execution patterns of a high-priority "victim" task (especially the critical executions). We develop a new statistical analysis technique and show that by analyzing the response times of the low-priority task, it becomes possible to extract the execution behavior of the high-priority task. We test our approach against a random selection technique that arbitrarily classifies a job as critical. We find that correlating the observer's response times with the victim's jobs can result in higher precision in identifying critical invocations compared to a random guess. We conduct extensive evaluations with systemically generated workloads, including a case study using a UAV autopilot (ArduPilot) taskset parameters. We found that our inference algorithm can achieve relatively low false positive rates (less than 25%) with relatively low footprint (1 MB memory and 50 ms timing overhead on a Raspberry Pi 4 platform). We further demonstrate the feasibility of inference on two cyber-physical platforms: an off-the-shelf manufacturing robot and a custom-built surveillance system. △ Less

Submitted 3 June, 2025; v1 submitted 17 May, 2025; originally announced June 2025.

arXiv:2506.00831 [pdf]

A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems

Authors: M Sabbir Salek, Mashrur Chowdhury, Muhaimin Bin Munir, Yuchen Cai, Mohammad Imtiaz Hasan, Jean-Michel Tine, Latifur Khan, Mizanur Rahman

Abstract: Modern transportation systems rely on cyber-physical systems (CPS), where cyber systems interact seamlessly with physical systems like transportation-related sensors and actuators to enhance safety, mobility, and energy efficiency. However, growing automation and connectivity increase exposure to cyber vulnerabilities. Existing threat modeling frameworks for transportation CPS are often limited in… ▽ More Modern transportation systems rely on cyber-physical systems (CPS), where cyber systems interact seamlessly with physical systems like transportation-related sensors and actuators to enhance safety, mobility, and energy efficiency. However, growing automation and connectivity increase exposure to cyber vulnerabilities. Existing threat modeling frameworks for transportation CPS are often limited in scope, resource-intensive, and dependent on significant cybersecurity expertise. To address these gaps, we present TraCR-TMF (Transportation Cybersecurity and Resiliency Threat Modeling Framework), a large language model (LLM)-based framework that minimizes expert intervention. TraCR-TMF identifies threats, potential attack techniques, and corresponding countermeasures by leveraging the MITRE ATT&CK matrix through three LLM-based approaches: (i) a retrieval-augmented generation (RAG) method requiring no expert input, (ii) an in-context learning approach requiring low expert input, and (iii) a supervised fine-tuning method requiring moderate expert input. TraCR-TMF also maps attack paths to critical assets by analyzing vulnerabilities using a customized LLM. The framework was evaluated in two scenarios. First, it identified relevant attack techniques across transportation CPS applications, with 90% precision as validated by experts. Second, using a fine-tuned LLM, it successfully predicted multiple exploitations including lateral movement, data exfiltration, and ransomware-related encryption that occurred during a major real-world cyberattack incident. These results demonstrate TraCR-TMF's effectiveness in CPS threat modeling, its reduced reliance on cybersecurity expertise, and its adaptability across CPS domains. △ Less

Submitted 1 June, 2025; originally announced June 2025.

arXiv:2505.24586 [pdf, ps, other]

All-sky search for individual Primordial Black Hole bursts with LHAASO

Authors: Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen, S. H. Chen , et al. (293 additional authors not shown)

Abstract: Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for… ▽ More Primordial Black Holes~(PBHs) are hypothetical black holes with a wide range of masses that formed in the early universe. As a result, they may play an important cosmological role and provide a unique probe of the early universe. A PBH with an initial mass of approximately $10^{15}$~g is expected to explode today in a final burst of Hawking radiation. In this work, we conduct an all-sky search for individual PBH burst events using the data collected from March 2021 to July 2024 by the Water Cherenkov Detector Array of the Large High Altitude Air Shower Observatory (LHAASO). Three PBH burst durations, 10~s, 20~s, and 100~s, are searched, with no significant PBH bursts observed. The upper limit on the local PBH burst rate density is set to be as low as 181~pc$^{-3}$~yr$^{-1}$ at 99$\%$ confidence level, representing the most stringent limit achieved to date. △ Less

Submitted 2 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

Comments: 8 pages, 2 figures

arXiv:2505.23944 [pdf, ps, other]

Retrieval Augmented Generation based Large Language Models for Causality Mining

Authors: Thushara Manjari Naduvilakandy, Hyeju Jang, Mohammad Al Hasan

Abstract: Causality detection and mining are important tasks in information retrieval due to their enormous use in information extraction, and knowledge graph construction. To solve these tasks, in existing literature there exist several solutions -- both unsupervised and supervised. However, the unsupervised methods suffer from poor performance and they often require significant human intervention for caus… ▽ More Causality detection and mining are important tasks in information retrieval due to their enormous use in information extraction, and knowledge graph construction. To solve these tasks, in existing literature there exist several solutions -- both unsupervised and supervised. However, the unsupervised methods suffer from poor performance and they often require significant human intervention for causal rule selection, leading to poor generalization across different domains. On the other hand, supervised methods suffer from the lack of large training datasets. Recently, large language models (LLMs) with effective prompt engineering are found to be effective to overcome the issue of unavailability of large training dataset. Yet, in existing literature, there does not exist comprehensive works on causality detection and mining using LLM prompting. In this paper, we present several retrieval-augmented generation (RAG) based dynamic prompting schemes to enhance LLM performance in causality detection and extraction tasks. Extensive experiments over three datasets and five LLMs validate the superiority of our proposed RAG-based dynamic prompting over other static prompting schemes. △ Less

Submitted 29 May, 2025; originally announced May 2025.

Comments: 13 pages, 6 figures, published in knowledgeNLP-NAACL2025

arXiv:2505.23938 [pdf, other]

Digital Forensic Investigation of the ChatGPT Windows Application

Authors: Malithi Wanniarachchi Kankanamge, Nick McKenna, Santiago Carmona, Syed Mhamudul Hasan, Abdur R. Shahid, Ahmed Imteaj

Abstract: The ChatGPT Windows application offers better user interaction in the Windows operating system (OS) by enhancing productivity and streamlining the workflow of ChatGPT's utilization. However, there are potential misuses associated with this application that require rigorous forensic analysis. This study presents a holistic forensic analysis of the ChatGPT Windows application, focusing on identifyin… ▽ More The ChatGPT Windows application offers better user interaction in the Windows operating system (OS) by enhancing productivity and streamlining the workflow of ChatGPT's utilization. However, there are potential misuses associated with this application that require rigorous forensic analysis. This study presents a holistic forensic analysis of the ChatGPT Windows application, focusing on identifying and recovering digital artifacts for investigative purposes. With the use of widely popular and openly available digital forensics tools such as Autopsy, FTK Imager, Magnet RAM Capture, Wireshark, and Hex Workshop, this research explores different methods to extract and analyze cache, chat logs, metadata, and network traffic from the application. Our key findings also demonstrate the history of the application's chat, user interactions, and system-level traces that can be recovered even after deletion, providing critical insights into the crime investigation and, thus, documenting and outlining a potential misuse report for digital forensics. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.21673 [pdf]

Supervised Link Prediction in Co-Authorship Networks Based on Author Node-Based Features

Authors: Doaa Hassan, Mohammad Al Hasan

Abstract: Predicting the emergence of future research collaborations between authors in academic social networks (SNs) is a very effective example that demonstrates the link prediction problem. This problem refers to predicting the potential existence or absence of a link between a pair of nodes (authors) on the co-authorship network. Various similarity and aggregation metrics were proposed in the literatur… ▽ More Predicting the emergence of future research collaborations between authors in academic social networks (SNs) is a very effective example that demonstrates the link prediction problem. This problem refers to predicting the potential existence or absence of a link between a pair of nodes (authors) on the co-authorship network. Various similarity and aggregation metrics were proposed in the literature for predicting the potential link between two authors on such networks. However, the relevant research did not investigate the impact of similarity of research interests of two authors or the similarity of their affiliations on the performance of predicting the potential link between them. Additionally, the impact of the aggregation of the research performance indices of two authors on link prediction performance was not highlighted. To this end, in this paper we propose an integrative supervised learning framework for predicting potential collaboration in co-authorship network based on similarity of the research interests and the similarity of the affiliations of each pair of authors in this network. Moreover, our proposed framework integrates the aggregation of research performance indices of each author pair and the similarity between the two authors nodes with the research interest and affiliation similarity as four metrics for predicting the potential link between each two authors. Our experimental results obtained from applying our proposed link prediction approach to the two largest connected graphs of two huge academic co-authorship networks, namely ArnetMiner and DBLP, show the great performance of this approach in predicting potential links between two authors on large-scale academic SNs. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.20554 [pdf, ps, other]

A Model of Ride Dispatch in Informal Market under Rival Entry

Authors: Md Mahadi Hasan

Abstract: I develop a continuous-time model in which an incumbent batch-service provider faces stochastic passenger arrivals and must decide when to dispatch under the threat of customer defection to a faster entrant. The incumbent's problem is formalized as a trade-off between departure frequency and load maximization, with the option to accept mid-route pickups. I characterize the equilibrium dispatch str… ▽ More I develop a continuous-time model in which an incumbent batch-service provider faces stochastic passenger arrivals and must decide when to dispatch under the threat of customer defection to a faster entrant. The incumbent's problem is formalized as a trade-off between departure frequency and load maximization, with the option to accept mid-route pickups. I characterize the equilibrium dispatch strategy and show that increased competitive pressure strictly reduces the feasible departure threshold, leading to more frequent departures with smaller passenger loads. Longer travel times tend to raise the unconstrained optimal threshold, but realized dispatch behavior also depends on passenger tolerance for delay. Endogenizing demand by letting the arrival rate fall with expected waiting time yields an interior optimum, rationalizing why incumbents now (i) depart partially full and (ii) accept mid-route riders. Comparative statics show that the optimal threshold tends to increase with travel time under a mild regularity condition and decreases with competitive intensity. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: 23 pages, 2 figures

arXiv:2505.19163 [pdf, ps, other]

SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs

Authors: Firoj Alam, Md Arid Hasan, Shammur Absar Chowdhury

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various disciplines and tasks. However, benchmarking their capabilities with multilingual spoken queries remains largely unexplored. In this study, we introduce SpokenNativQA, the first multilingual and culturally aligned spoken question-answering (SQA) dataset designed to evaluate LLMs in real-world conversational settin… ▽ More Large Language Models (LLMs) have demonstrated remarkable performance across various disciplines and tasks. However, benchmarking their capabilities with multilingual spoken queries remains largely unexplored. In this study, we introduce SpokenNativQA, the first multilingual and culturally aligned spoken question-answering (SQA) dataset designed to evaluate LLMs in real-world conversational settings. The dataset comprises approximately 33,000 naturally spoken questions and answers in multiple languages, including low-resource and dialect-rich languages, providing a robust benchmark for assessing LLM performance in speech-based interactions. SpokenNativQA addresses the limitations of text-based QA datasets by incorporating speech variability, accents, and linguistic diversity. We benchmark different ASR systems and LLMs for SQA and present our findings. We released the data at (https://huggingface.co/datasets/QCRI/SpokenNativQA) and the experimental scripts at (https://llmebench.qcri.org/) for the research community. △ Less

Submitted 25 May, 2025; originally announced May 2025.

Comments: Spoken Question Answering, Multilingual LLMs, Speech-based Evaluation, Dialectal Speech, Low-resource Languages, Multimodal Benchmarking, Conversational AI, Speech-to-Text QA, Real-world Interaction, Natural Language Understanding

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2505.18865 [pdf, ps, other]

SW-ViT: A Spatio-Temporal Vision Transformer Network with Post Denoiser for Sequential Multi-Push Ultrasound Shear Wave Elastography

Authors: Ahsan Habib Akash, MD Jahin Alam, Md. Kamrul Hasan

Abstract: Objective: Ultrasound Shear Wave Elastography (SWE) demonstrates great potential in assessing soft-tissue pathology by mapping tissue stiffness, which is linked to malignancy. Traditional SWE methods have shown promise in estimating tissue elasticity, yet their susceptibility to noise interference, reliance on limited training data, and inability to generate segmentation masks concurrently present… ▽ More Objective: Ultrasound Shear Wave Elastography (SWE) demonstrates great potential in assessing soft-tissue pathology by mapping tissue stiffness, which is linked to malignancy. Traditional SWE methods have shown promise in estimating tissue elasticity, yet their susceptibility to noise interference, reliance on limited training data, and inability to generate segmentation masks concurrently present notable challenges to accuracy and reliability. Approach: In this paper, we propose SW-ViT, a novel two-stage deep learning framework for SWE that integrates a CNN-Spatio-Temporal Vision Transformer-based reconstruction network with an efficient Transformer-based post-denoising network. The first stage uses a 3D ResNet encoder with multi-resolution spatio-temporal Transformer blocks that capture spatial and temporal features, followed by a squeeze-and-excitation attention decoder that reconstructs 2D stiffness maps. To address data limitations, a patch-based training strategy is adopted for localized learning and reconstruction. In the second stage, a denoising network with a shared encoder and dual decoders processes inclusion and background regions to produce a refined stiffness map and segmentation mask. A hybrid loss combining regional, smoothness, fusion, and Intersection over Union (IoU) components ensures improvements in both reconstruction and segmentation. Results: On simulated data, our method achieves PSNR of 32.68 dB, CNR of 46.78 dB, and SSIM of 0.995. On phantom data, results include PSNR of 21.11 dB, CNR of 42.14 dB, and SSIM of 0.936. Segmentation IoU values reach 0.949 (simulation) and 0.738 (phantom) with ASSD values being 0.184 and 1.011, respectively. Significance: SW-ViT delivers robust, high-quality elasticity map estimates from noisy SWE data and holds clear promise for clinical application. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.18845 [pdf, ps, other]

Multi-Party Conversational Agents: A Survey

Authors: Sagar Sapkota, Mohammad Saqib Hasan, Mubarak Shah, Santu Karmaker

Abstract: Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each parti… ▽ More Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each participants' mental states? (State of Mind Modeling); 2) Can they properly understand the dialogue content? (Semantic Understanding); and 3) Can they reason about and predict future conversation flow? (Agent Action Modeling). We review methods ranging from classical machine learning to Large Language Models (LLMs) and multi-modal systems. Our analysis underscores Theory of Mind (ToM) as essential for building intelligent MPCAs and highlights multi-modal understanding as a promising yet underexplored direction. Finally, this survey offers guidance to future researchers on developing more capable MPCAs. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.18725 [pdf, other]

doi 10.1109/ICCIT64611.2024.11021905

Deep Learning for Breast Cancer Detection: Comparative Analysis of ConvNeXT and EfficientNet

Authors: Mahmudul Hasan

Abstract: Breast cancer is the most commonly occurring cancer worldwide. This cancer caused 670,000 deaths globally in 2022, as reported by the WHO. Yet since health officials began routine mammography screening in age groups deemed at risk in the 1980s, breast cancer mortality has decreased by 40% in high-income nations. Every day, a greater and greater number of people are receiving a breast cancer diagno… ▽ More Breast cancer is the most commonly occurring cancer worldwide. This cancer caused 670,000 deaths globally in 2022, as reported by the WHO. Yet since health officials began routine mammography screening in age groups deemed at risk in the 1980s, breast cancer mortality has decreased by 40% in high-income nations. Every day, a greater and greater number of people are receiving a breast cancer diagnosis. Reducing cancer-related deaths requires early detection and treatment. This paper compares two convolutional neural networks called ConvNeXT and EfficientNet to predict the likelihood of cancer in mammograms from screening exams. Preprocessing of the images, classification, and performance evaluation are main parts of the whole procedure. Several evaluation metrics were used to compare and evaluate the performance of the models. The result shows that ConvNeXT generates better results with a 94.33% AUC score, 93.36% accuracy, and 95.13% F-score compared to EfficientNet with a 92.34% AUC score, 91.47% accuracy, and 93.06% F-score on RSNA screening mammography breast cancer dataset. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.14447 [pdf, ps, other]

First Identification and Precise Spectral Measurement of the Proton Component in the Cosmic-Ray `Knee'

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (292 additional authors not shown)

Abstract: We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and syst… ▽ More We report the first high-purity identification of cosmic-ray (CR) protons and a precise measurement of their energy spectrum from 0.15 to 12 PeV using the Large High Altitude Air Shower Observatory (LHAASO). Abundant event statistics, combined with the simultaneous detection of electrons/photons, muons, and Cherenkov light in air showers, enable spectroscopic measurements with statistical and systematic accuracy comparable to satellite data at lower energies. The proton spectrum shows significant hardening relative to low-energy extrapolations, culminating at 3 PeV, followed by sharp softening. This distinct spectral structure - closely aligned with the knee in the all-particle spectrum - points to the emergence of a new CR component at PeV energies, likely linked to the dozens of PeVatrons recently discovered by LHAASO, and offers crucial clues to the origin of Galactic cosmic rays. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.12651 [pdf, ps, other]

$\texttt{DIAMONDs}$: A Dataset for $\mathbb{D}$ynamic $\mathbb{I}$nformation $\mathbb{A}$nd $\mathbb{M}$ental modeling $\mathbb{O}$f $\mathbb{N}$umeric $\mathbb{D}$iscussions

Authors: Sayontan Ghosh, Mahnaz Koupaee, Yash Kumar Lal, Pegah Alipoormolabashi, Mohammad Saqib Hasan, Jun Seok Kang, Niranjan Balasubramanian

Abstract: Understanding multiparty conversations demands robust Theory of Mind (ToM) capabilities, including the ability to track dynamic information, manage knowledge asymmetries, and distinguish relevant information across extended exchanges. To advance ToM evaluation in such settings, we present a carefully designed scalable methodology for generating high-quality benchmark conversation-question pairs wi… ▽ More Understanding multiparty conversations demands robust Theory of Mind (ToM) capabilities, including the ability to track dynamic information, manage knowledge asymmetries, and distinguish relevant information across extended exchanges. To advance ToM evaluation in such settings, we present a carefully designed scalable methodology for generating high-quality benchmark conversation-question pairs with these characteristics. Using this methodology, we create $\texttt{DIAMONDs}$, a new conversational QA dataset covering common business, financial or other group interactions. In these goal-oriented conversations, participants often have to track certain numerical quantities (say $\textit{expected profit}$) of interest that can be derived from other variable quantities (like $\textit{marketing expenses, expected sales, salary}$, etc.), whose values also change over the course of the conversation. $\texttt{DIAMONDs}$ questions pose simple numerical reasoning problems over such quantities of interest (e.g., $\textit{funds required for charity events, expected company profit next quarter}$, etc.) in the context of the information exchanged in conversations. This allows for precisely evaluating ToM capabilities for carefully tracking and reasoning over participants' knowledge states. Our evaluation of state-of-the-art language models reveals significant challenges in handling participant-centric reasoning, specifically in situations where participants have false beliefs. Models also struggle with conversations containing distractors and show limited ability to identify scenarios with insufficient information. These findings highlight current models' ToM limitations in handling real-world multi-party conversations. △ Less

Submitted 18 May, 2025; originally announced May 2025.

arXiv:2505.09187 [pdf, ps, other]

Wavefunction-Free Approach for Predicting Nonlinear Responses in Weyl Semimetals

Authors: Mohammad Yahyavi, Ilya Belopolski, Yuanjun Jin, Md Shafayat Hossain, Yilin Zhao, Jinyang Ni, Naizhou Wang, Yi-Chun Hung, Zi-Jia Cheng, Tyler A. Cochran, Tay-Rong Chang, Wei-bo Gao, Su-Yang Xu, Jia-Xin Yin, Qiong Ma, M. Zahid Hasan, Arun Bansil, Naoto Nagaosa, Guoqing Chang

Abstract: By sidestepping the intractable calculations of many-body wavefunctions, density functional theory (DFT) has revolutionized the prediction of ground states of materials. However, predicting nonlinear responses--critical for next-generation quantum devices--still relies heavily on explicit wavefunctions, limiting computational efficiency. In this letter, using the circular photogalvanic effect (CPG… ▽ More By sidestepping the intractable calculations of many-body wavefunctions, density functional theory (DFT) has revolutionized the prediction of ground states of materials. However, predicting nonlinear responses--critical for next-generation quantum devices--still relies heavily on explicit wavefunctions, limiting computational efficiency. In this letter, using the circular photogalvanic effect (CPGE) in Weyl semimetals as a representative example, we realize a 1000-fold computational speedup by eliminating the explicit dependence on wavefunctions. Our approach leverages the one-to-one correspondence between free parameters of Weyl fermions and the associated responses to obtain precise wavefunction-free formulations. Applying our methodology, we systematically investigated known Weyl semimetals and revealed that Ta$_3$S$_2$ exhibits photocurrents an order of magnitude greater than those observed in TaAs, with potential for an additional order-of-magnitude enhancement under strain. Our work paves the way for substantially more efficient screening and optimization of nonlinear electromagnetic properties of topological quantum materials. △ Less

Submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.08985 [pdf, other]

Position-Normal Manifold for Efficient Glint Rendering on High-Resolution Normal Maps

Authors: Liwen Wu, Fujun Luan, Miloš Hašan, Ravi Ramamoorthi

Abstract: Detailed microstructures on specular objects often exhibit intriguing glinty patterns under high-frequency lighting, which is challenging to render using a conventional normal-mapped BRDF. In this paper, we present a manifold-based formulation of the glint normal distribution functions (NDF) that precisely captures the surface normal distributions over queried footprints. The manifold-based formul… ▽ More Detailed microstructures on specular objects often exhibit intriguing glinty patterns under high-frequency lighting, which is challenging to render using a conventional normal-mapped BRDF. In this paper, we present a manifold-based formulation of the glint normal distribution functions (NDF) that precisely captures the surface normal distributions over queried footprints. The manifold-based formulation transfers the integration for the glint NDF construction to a problem of mesh intersections. Compared to previous works that rely on complex numerical approximations, our integral solution is exact and much simpler to compute, which also allows an easy adaptation of a mesh clustering hierarchy to accelerate the NDF evaluation of large footprints. Our performance and quality analysis shows that our NDF formulation achieves similar glinty appearance compared to the baselines but is an order of magnitude faster. Within this framework, we further present a novel derivation of analytical shadow-masking for normal-mapped diffuse surfaces -- a component that is often ignored in previous works. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2505.08889 [pdf, other]

doi 10.1145/3731173

IntrinsicEdit: Precise generative image manipulation in intrinsic space

Authors: Linjie Lyu, Valentin Deschaintre, Yannick Hold-Geoffroy, Miloš Hašan, Jae Shin Yoon, Thomas Leimkühler, Christian Theobalt, Iliyan Georgiev

Abstract: Generative diffusion models have advanced image editing with high-quality results and intuitive interfaces such as prompts and semantic drawing. However, these interfaces lack precise control, and the associated methods typically specialize on a single editing task. We introduce a versatile, generative workflow that operates in an intrinsic-image latent space, enabling semantic, local manipulation… ▽ More Generative diffusion models have advanced image editing with high-quality results and intuitive interfaces such as prompts and semantic drawing. However, these interfaces lack precise control, and the associated methods typically specialize on a single editing task. We introduce a versatile, generative workflow that operates in an intrinsic-image latent space, enabling semantic, local manipulation with pixel precision for a range of editing operations. Building atop the RGB-X diffusion framework, we address key challenges of identity preservation and intrinsic-channel entanglement. By incorporating exact diffusion inversion and disentangled channel manipulation, we enable precise, efficient editing with automatic resolution of global illumination effects -- all without additional data collection or model fine-tuning. We demonstrate state-of-the-art performance across a variety of tasks on complex images, including color and texture adjustments, object insertion and removal, global relighting, and their combinations. △ Less

Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

Comments: SIGGRAPH 2025 Journal track

arXiv:2505.07407 [pdf, other]

Scaling of wall pressure and the peak of streamwise turbulence intensity in compressible wall flows

Authors: Asif Manzoor Hasan, Pedro Costa, Johan Larsson, Rene Pecnik

Abstract: This paper develops scaling laws for wall-pressure root-mean-square (r.m.s.) and the peak of streamwise turbulence intensity, accounting for both variable-property and intrinsic compressibility effects -- those associated with changes in fluid volume due to pressure variations. To develop such scaling laws, we express the target quantities as an expansion series in powers of an appropriately defin… ▽ More This paper develops scaling laws for wall-pressure root-mean-square (r.m.s.) and the peak of streamwise turbulence intensity, accounting for both variable-property and intrinsic compressibility effects -- those associated with changes in fluid volume due to pressure variations. To develop such scaling laws, we express the target quantities as an expansion series in powers of an appropriately defined Mach number. The leading-order term is represented using the scaling relations developed for incompressible flows, but with an effective Reynolds number. Higher-order terms capture intrinsic compressibility effects and are modeled as constant coefficients, calibrated using flow cases specifically designed to isolate these effects. The resulting scaling relations are shown to be accurate for a wide range of turbulent channel flows and boundary layers. △ Less

Submitted 13 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

Comments: 10 pages, 2 figures

arXiv:2505.07036 [pdf]

Predicting Diabetes Using Machine Learning: A Comparative Study of Classifiers

Authors: Mahade Hasan, Farhana Yasmin

Abstract: Diabetes remains a significant health challenge globally, contributing to severe complications like kidney disease, vision loss, and heart issues. The application of machine learning (ML) in healthcare enables efficient and accurate disease prediction, offering avenues for early intervention and patient support. Our study introduces an innovative diabetes prediction framework, leveraging both trad… ▽ More Diabetes remains a significant health challenge globally, contributing to severe complications like kidney disease, vision loss, and heart issues. The application of machine learning (ML) in healthcare enables efficient and accurate disease prediction, offering avenues for early intervention and patient support. Our study introduces an innovative diabetes prediction framework, leveraging both traditional ML techniques such as Logistic Regression, SVM, Naïve Bayes, and Random Forest and advanced ensemble methods like AdaBoost, Gradient Boosting, Extra Trees, and XGBoost. Central to our approach is the development of a novel model, DNet, a hybrid architecture combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) layers for effective feature extraction and sequential learning. The DNet model comprises an initial convolutional block for capturing essential features, followed by a residual block with skip connections to facilitate efficient information flow. Batch Normalization and Dropout are employed for robust regularization, and an LSTM layer captures temporal dependencies within the data. Using a Kaggle-sourced real-world diabetes dataset, our model evaluation spans cross-validation accuracy, precision, recall, F1 score, and ROC-AUC. Among the models, DNet demonstrates the highest efficacy with an accuracy of 99.79% and an AUC-ROC of 99.98%, establishing its potential for superior diabetes prediction. This robust hybrid architecture showcases the value of combining CNN and LSTM layers, emphasizing its applicability in medical diagnostics and disease prediction tasks. △ Less

Submitted 11 May, 2025; originally announced May 2025.

arXiv:2505.06528 [pdf, ps, other]

Unmasking Deep Fakes: Leveraging Deep Learning for Video Authenticity Detection

Authors: Mahmudul Hasan, Sadia Ruhama, Sabrina Tajnim Sithi, Chowdhury Mohammad Mutamir Samit, Oindrila Saha

Abstract: Deepfake videos, produced through advanced artificial intelligence methods now a days, pose a new challenge to the truthfulness of the digital media. As Deepfake becomes more convincing day by day, detecting them requires advanced methods capable of identifying subtle inconsistencies. The primary motivation of this paper is to recognize deepfake videos using deep learning techniques, specifically… ▽ More Deepfake videos, produced through advanced artificial intelligence methods now a days, pose a new challenge to the truthfulness of the digital media. As Deepfake becomes more convincing day by day, detecting them requires advanced methods capable of identifying subtle inconsistencies. The primary motivation of this paper is to recognize deepfake videos using deep learning techniques, specifically by using convolutional neural networks. Deep learning excels in pattern recognition, hence, makes it an ideal approach for detecting the intricate manipulations in deepfakes. In this paper, we consider using MTCNN as a face detector and EfficientNet-B5 as encoder model to predict if a video is deepfake or not. We utilize training and evaluation dataset from Kaggle DFDC. The results shows that our deepfake detection model acquired 42.78% log loss, 93.80% AUC and 86.82% F1 score on kaggle's DFDC dataset. △ Less

Submitted 15 June, 2025; v1 submitted 10 May, 2025; originally announced May 2025.

arXiv:2505.06502 [pdf, ps, other]

PC-SRGAN: Physically Consistent Super-Resolution Generative Adversarial Network for General Transient Simulations

Authors: Md Rakibul Hasan, Pouria Behnoudfar, Dan MacKinlay, Thomas Poulet

Abstract: Machine Learning, particularly Generative Adversarial Networks (GANs), has revolutionised Super Resolution (SR). However, generated images often lack physical meaningfulness, which is essential for scientific applications. Our approach, PC-SRGAN, enhances image resolution while ensuring physical consistency for interpretable simulations. PC-SRGAN significantly improves both the Peak Signal-to-Nois… ▽ More Machine Learning, particularly Generative Adversarial Networks (GANs), has revolutionised Super Resolution (SR). However, generated images often lack physical meaningfulness, which is essential for scientific applications. Our approach, PC-SRGAN, enhances image resolution while ensuring physical consistency for interpretable simulations. PC-SRGAN significantly improves both the Peak Signal-to-Noise Ratio and the Structural Similarity Index Measure compared to conventional methods, even with limited training data (e.g., only 13% of training data required for SRGAN). Beyond SR, PC-SRGAN augments physically meaningful machine learning, incorporating numerically justified time integrators and advanced quality metrics. These advancements promise reliable and causal machine-learning models in scientific domains. A significant advantage of PC-SRGAN over conventional SR techniques is its physical consistency, which makes it a viable surrogate model for time-dependent problems. PC-SRGAN advances scientific machine learning, offering improved accuracy and efficiency for image processing, enhanced process understanding, and broader applications to scientific research. The source codes and data will be made publicly available at https://github.com/hasan-rakibul/PC-SRGAN upon acceptance of this paper. △ Less

Submitted 10 May, 2025; originally announced May 2025.

arXiv:2505.06454 [pdf, ps, other]

Sponge Attacks on Sensing AI: Energy-Latency Vulnerabilities and Defense via Model Pruning

Authors: Syed Mhamudul Hasan, Hussein Zangoti, Iraklis Anagnostopoulos, Abdur R. Shahid

Abstract: Recent studies have shown that sponge attacks can significantly increase the energy consumption and inference latency of deep neural networks (DNNs). However, prior work has focused primarily on computer vision and natural language processing tasks, overlooking the growing use of lightweight AI models in sensing-based applications on resource-constrained devices, such as those in Internet of Thing… ▽ More Recent studies have shown that sponge attacks can significantly increase the energy consumption and inference latency of deep neural networks (DNNs). However, prior work has focused primarily on computer vision and natural language processing tasks, overlooking the growing use of lightweight AI models in sensing-based applications on resource-constrained devices, such as those in Internet of Things (IoT) environments. These attacks pose serious threats of energy depletion and latency degradation in systems where limited battery capacity and real-time responsiveness are critical for reliable operation. This paper makes two key contributions. First, we present the first systematic exploration of energy-latency sponge attacks targeting sensing-based AI models. Using wearable sensing-based AI as a case study, we demonstrate that sponge attacks can substantially degrade performance by increasing energy consumption, leading to faster battery drain, and by prolonging inference latency. Second, to mitigate such attacks, we investigate model pruning, a widely adopted compression technique for resource-constrained AI, as a potential defense. Our experiments show that pruning-induced sparsity significantly improves model resilience against sponge poisoning. We also quantify the trade-offs between model efficiency and attack resilience, offering insights into the security implications of model compression in sensing-based AI systems deployed in IoT environments. △ Less

Submitted 9 May, 2025; originally announced May 2025.

arXiv:2505.02694 [pdf, other]

AI Standardized Patient Improves Human Conversations in Advanced Cancer Care

Authors: Kurtis Haut, Masum Hasan, Thomas Carroll, Ronald Epstein, Taylan Sen, Ehsan Hoque

Abstract: Serious illness communication (SIC) in end-of-life care faces challenges such as emotional stress, cultural barriers, and balancing hope with honesty. Despite its importance, one of the few available ways for clinicians to practice SIC is with standardized patients, which is expensive, time-consuming, and inflexible. In this paper, we present SOPHIE, an AI-powered standardized patient simulation a… ▽ More Serious illness communication (SIC) in end-of-life care faces challenges such as emotional stress, cultural barriers, and balancing hope with honesty. Despite its importance, one of the few available ways for clinicians to practice SIC is with standardized patients, which is expensive, time-consuming, and inflexible. In this paper, we present SOPHIE, an AI-powered standardized patient simulation and automated feedback system. SOPHIE combines large language models (LLMs), a lifelike virtual avatar, and automated, personalized feedback based on clinical literature to provide remote, on-demand SIC training. In a randomized control study with healthcare students and professionals, SOPHIE users demonstrated significant improvement across three critical SIC domains: Empathize, Be Explicit, and Empower. These results suggest that AI-driven tools can enhance complex interpersonal communication skills, offering scalable, accessible solutions to address a critical gap in clinician education. △ Less

Submitted 5 May, 2025; originally announced May 2025.

Comments: 20 pages, 6 figures, 4 tables, submitting to New England Journal of Medicine (NEJM)

arXiv:2504.14764 [pdf, other]

Steering Semantic Data Processing With DocWrangler

Authors: Shreya Shankar, Bhavya Chopra, Mawil Hasan, Stephen Lee, Björn Hartmann, Joseph M. Hellerstein, Aditya G. Parameswaran, Eugene Wu

Abstract: Unstructured text has long been difficult to automatically analyze at scale. Large language models (LLMs) now offer a way forward by enabling {\em semantic data processing}, where familiar data processing operators (e.g., map, reduce, filter) are powered by LLMs instead of code. However, building effective semantic data processing pipelines presents a departure from traditional data pipelines: use… ▽ More Unstructured text has long been difficult to automatically analyze at scale. Large language models (LLMs) now offer a way forward by enabling {\em semantic data processing}, where familiar data processing operators (e.g., map, reduce, filter) are powered by LLMs instead of code. However, building effective semantic data processing pipelines presents a departure from traditional data pipelines: users need to understand their data to write effective pipelines, yet they need to construct pipelines to extract the data necessary for that understanding -- all while navigating LLM idiosyncrasies and inconsistencies. We present \docwrangler, a mixed-initiative integrated development environment (IDE) for semantic data processing with three novel features to address the gaps between the user, their data, and their pipeline: {\em (i) In-Situ User Notes} that allows users to inspect, annotate, and track observations across documents and LLM outputs, {\em (ii) LLM-Assisted Prompt Refinement} that transforms user notes into improved operations, and {\em (iii) LLM-Assisted Operation Decomposition} that identifies when operations or documents are too complex for the LLM to correctly process and suggests decompositions. Our evaluation combines a think-aloud study with 10 participants and a public-facing deployment (available at \href{https://docetl.org/playground}{docetl.org/playground}) with 1,500+ recorded sessions, revealing how users develop systematic strategies for their semantic data processing tasks; e.g., transforming open-ended operations into classifiers for easier validation and intentionally using vague prompts to learn more about their data or LLM capabilities. △ Less

Submitted 20 April, 2025; originally announced April 2025.

Comments: 18 pages; 11 figures; 3 tables

arXiv:2504.14473 [pdf, other]

Invariance-embedded Machine Learning Sub-grid-scale Stress Models for Meso-scale Hurricane Boundary Layer Flow Simulation I: Model Development and $\textit{a priori}$ Studies

Authors: Md Badrul Hasan, Meilin Yu, Tim Oates

Abstract: This study develops invariance-embedded machine learning sub-grid-scale (SGS) stress models admitting turbulence kinetic energy (TKE) backscatter towards more accurate large eddy simulation (LES) of meso-scale turbulent hurricane boundary layer flows. The new machine learning SGS model consists of two parts: a classification model used to distinguish regions with either strong energy cascade or en… ▽ More This study develops invariance-embedded machine learning sub-grid-scale (SGS) stress models admitting turbulence kinetic energy (TKE) backscatter towards more accurate large eddy simulation (LES) of meso-scale turbulent hurricane boundary layer flows. The new machine learning SGS model consists of two parts: a classification model used to distinguish regions with either strong energy cascade or energy backscatter from those with mild TKE transfer and a regression model used to calculate SGS stresses in regions with strong TKE transfer. To ease model implementation in computational fluid dynamics (CFD) solvers, the Smagorinsky model with a signed coefficient $C_s$, where a positive value indicates energy cascade while a negative one indicates energy backscatter, is employed as the carrier of the machine learning model. To improve its robustness and generality, both physical invariance and geometric invariance features of turbulent flows are embedded into the model input for classification and regression, and the signed Smagorinsky model coefficient is used as the output of the regression model. Different machine-learning methods and input setups have been used to test the classification model's performance. The F1-scores, which measure balanced precision and recall of a model, of the classification models with physical and geometric invariance embedded can be improved by about $17\%$ over those without considering geometric invariance. Regression models based on ensemble neural networks have demonstrated superior performance in predicting the signed Smagorinsky model coefficient, exceeding that of the dynamic Smagorinsky model in $\textit{a priori}$ tests. △ Less

Submitted 19 April, 2025; originally announced April 2025.

arXiv:2504.14050 [pdf, other]

MMformer with Adaptive Transferable Attention: Advancing Multivariate Time Series Forecasting for Environmental Applications

Authors: Ning Xin, Jionglong Su, Md Maruf Hasan

Abstract: Environmental crisis remains a global challenge that affects public health and environmental quality. Despite extensive research, accurately forecasting environmental change trends to inform targeted policies and assess prediction efficiency remains elusive. Conventional methods for multivariate time series (MTS) analysis often fail to capture the complex dynamics of environmental change. To addre… ▽ More Environmental crisis remains a global challenge that affects public health and environmental quality. Despite extensive research, accurately forecasting environmental change trends to inform targeted policies and assess prediction efficiency remains elusive. Conventional methods for multivariate time series (MTS) analysis often fail to capture the complex dynamics of environmental change. To address this, we introduce an innovative meta-learning MTS model, MMformer with Adaptive Transferable Multi-head Attention (ATMA), which combines self-attention and meta-learning for enhanced MTS forecasting. Specifically, MMformer is used to model and predict the time series of seven air quality indicators across 331 cities in China from January 2018 to June 2021 and the time series of precipitation and temperature at 2415 monitoring sites during the summer (276 days) from 2012 to 2014, validating the network's ability to perform and forecast MTS data successfully. Experimental results demonstrate that in these datasets, the MMformer model reaching SOTA outperforms iTransformer, Transformer, and the widely used traditional time series prediction algorithm SARIMAX in the prediction of MTS, reducing by 50\% in MSE, 20\% in MAE as compared to others in air quality datasets, reducing by 20\% in MAPE except SARIMAX. Compared with Transformer and SARIMAX in the climate datasets, MSE, MAE, and MAPE are decreased by 30\%, and there is an improvement compared to iTransformer. This approach represents a significant advance in our ability to forecast and respond to dynamic environmental quality challenges in diverse urban and rural environments. Its predictive capabilities provide valuable public health and environmental quality information, informing targeted interventions. △ Less

Submitted 18 April, 2025; originally announced April 2025.

arXiv:2504.13990 [pdf, other]

PC-DeepNet: A GNSS Positioning Error Minimization Framework Using Permutation-Invariant Deep Neural Network

Authors: M. Humayun Kabir, Md. Ali Hasan, Md. Shafiqul Islam, Kyeongjun Ko, Wonjae Shin

Abstract: Global navigation satellite systems (GNSS) face significant challenges in urban and sub-urban areas due to non-line-of-sight (NLOS) propagation, multipath effects, and low received power levels, resulting in highly non-linear and non-Gaussian measurement error distributions. In light of this, conventional model-based positioning approaches, which rely on Gaussian error approximations, struggle to… ▽ More Global navigation satellite systems (GNSS) face significant challenges in urban and sub-urban areas due to non-line-of-sight (NLOS) propagation, multipath effects, and low received power levels, resulting in highly non-linear and non-Gaussian measurement error distributions. In light of this, conventional model-based positioning approaches, which rely on Gaussian error approximations, struggle to achieve precise localization under these conditions. To overcome these challenges, we put forth a novel learning-based framework, PC-DeepNet, that employs a permutation-invariant (PI) deep neural network (DNN) to estimate position corrections (PC). This approach is designed to ensure robustness against changes in the number and/or order of visible satellite measurements, a common issue in GNSS systems, while leveraging NLOS and multipath indicators as features to enhance positioning accuracy in challenging urban and sub-urban environments. To validate the performance of the proposed framework, we compare the positioning error with state-of-the-art model-based and learning-based positioning methods using two publicly available datasets. The results confirm that proposed PC-DeepNet achieves superior accuracy than existing model-based and learning-based methods while exhibiting lower computational complexity compared to previous learning-based approaches. △ Less

Submitted 18 April, 2025; originally announced April 2025.

Comments: 31 pages, 14 figures, 6 tables

arXiv:2504.11563 [pdf]

Dynamical electronic correlation and chiral magnetism in van der Waals magnet Fe4GeTe2

Authors: Md. Nur Hasan, Nastaran Salehi, Felix Sorgenfrei, Anna Delin, Igor Di Marco, Anders Bergman, Manuel Pereiro, Patrik Thunström, Olle Eriksson, Debjani Karmakar

Abstract: Among the quasi-2D van der Waals magnetic systems, Fe4GeTe2 imprints a profound impact due to its near-room temperature ferromagnetic behaviour and the complex magnetothermal phase diagram exhibiting multiple phase transformations, as observed from magnetization and magnetotransport measurements. A complete analysis of these phase transformations in the light of electronic correlation and its impa… ▽ More Among the quasi-2D van der Waals magnetic systems, Fe4GeTe2 imprints a profound impact due to its near-room temperature ferromagnetic behaviour and the complex magnetothermal phase diagram exhibiting multiple phase transformations, as observed from magnetization and magnetotransport measurements. A complete analysis of these phase transformations in the light of electronic correlation and its impact on the underlying magnetic interactions remain unattended in the existing literature. Using first-principles methodologies, incorporating the dynamical nature of electron correlation, we have analysed the interplay of the direction of magnetization in the easy-plane and easy-axis manner with the underlying crystal symmetry, which reveals the opening of a pseudogap feature beyond the spin-reorientation transition (SRT) temperature. The impact of dynamical correlation on the calculated magnetic circular dichroism and x-ray absorption spectrum of the L-edge of the Fe atoms compared well with the existing experimental observations. The calculated intersite Heisenberg exchange interactions display a complicated nature, depending upon the pairwise interactions among the two inequivalent Fe sites, indicating a RKKY-like behaviour of the magnetic interactions. We noted the existence of significant anisotropic and antisymmetric exchanges interactions, resulting into a chirality in the magnetic behaviour of the system. Subsequent investigation of the dynamical aspects of magnetism in Fe4GeTe2 and the respective magnetothermal phase diagram reveal that the dynamical nature of spins and the decoupling of the magnetic properties for both sites of Fe is crucial to explain all the experimentally observed phase transformations. △ Less

Submitted 15 April, 2025; originally announced April 2025.

Comments: Accepted for publication as a Regular Article in Physical Review B

arXiv:2504.10808 [pdf, other]

Tabular foundation model to detect empathy from visual cues

Authors: Md Rakibul Hasan, Shafin Rahman, Md Zakir Hossain, Aneesh Krishna, Tom Gedeon

Abstract: Detecting empathy from video interactions is an emerging area of research. Video datasets, however, are often released as extracted features (i.e., tabular data) rather than raw footage due to privacy and ethical concerns. Prior research on such tabular datasets established tree-based classical machine learning approaches as the best-performing models. Motivated by the recent success of textual fo… ▽ More Detecting empathy from video interactions is an emerging area of research. Video datasets, however, are often released as extracted features (i.e., tabular data) rather than raw footage due to privacy and ethical concerns. Prior research on such tabular datasets established tree-based classical machine learning approaches as the best-performing models. Motivated by the recent success of textual foundation models (i.e., large language models), we explore the use of tabular foundation models in empathy detection from tabular visual features. We experiment with two recent tabular foundation models $-$ TabPFN v2 and TabICL $-$ through in-context learning and fine-tuning setups. Our experiments on a public human-robot interaction benchmark demonstrate a significant boost in cross-subject empathy detection accuracy over several strong baselines (accuracy: $0.590 \rightarrow 0.730$; AUC: $0.564 \rightarrow 0.669$). In addition to performance improvement, we contribute novel insights and an evaluation setup to ensure generalisation on unseen subjects in this public benchmark. As the practice of releasing video features as tabular datasets is likely to persist due to privacy constraints, our findings will be widely applicable to future empathy detection video datasets as well. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.06598 [pdf, ps, other]

Stochastic Ray Tracing of Transparent 3D Gaussians

Authors: Xin Sun, Iliyan Georgiev, Yun Fei, Miloš Hašan

Abstract: 3D Gaussian splatting has been widely adopted as a 3D representation for novel-view synthesis, relighting, and 3D generation tasks. It delivers realistic and detailed results through a collection of explicit 3D Gaussian primitives, each carrying opacity and view-dependent color. However, efficient rendering of many transparent primitives remains a significant challenge. Existing approaches either… ▽ More 3D Gaussian splatting has been widely adopted as a 3D representation for novel-view synthesis, relighting, and 3D generation tasks. It delivers realistic and detailed results through a collection of explicit 3D Gaussian primitives, each carrying opacity and view-dependent color. However, efficient rendering of many transparent primitives remains a significant challenge. Existing approaches either rasterize the Gaussians with approximate per-view sorting or rely on high-end RTX GPUs. This paper proposes a stochastic ray-tracing method to render 3D clouds of transparent primitives. Instead of processing all ray-Gaussian intersections in sequential order, each ray traverses the acceleration structure only once, randomly accepting and shading a single intersection (or $N$ intersections, using a simple extension). This approach minimizes shading time and avoids primitive sorting along the ray, thereby minimizing register usage and maximizing parallelism even on low-end GPUs. The cost of rays through the Gaussian asset is comparable to that of standard mesh-intersection rays. The shading is unbiased and has low variance, as our stochastic acceptance achieves importance sampling based on accumulated weight. The alignment with Monte Carlo philosophy simplifies implementation and integration into a conventional path-tracing framework. △ Less

Submitted 9 June, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

Comments: 10 pages, 7 figures, 5 tables

arXiv:2504.05995 [pdf, other]

NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge

Authors: Firoj Alam, Md Arid Hasan, Sahinur Rahman Laskar, Mucahid Kutlu, Shammur Absar Chowdhury

Abstract: The rapid advancement of large language models (LLMs) has raised concerns about cultural bias, fairness, and their applicability in diverse linguistic and underrepresented regional contexts. To enhance and benchmark the capabilities of LLMs, there is a need to develop large-scale resources focused on multilingual, local, and cultural contexts. In this study, we propose a framework, NativQA, that c… ▽ More The rapid advancement of large language models (LLMs) has raised concerns about cultural bias, fairness, and their applicability in diverse linguistic and underrepresented regional contexts. To enhance and benchmark the capabilities of LLMs, there is a need to develop large-scale resources focused on multilingual, local, and cultural contexts. In this study, we propose a framework, NativQA, that can seamlessly construct large-scale, culturally and regionally aligned QA datasets in native languages. The framework utilizes user-defined seed queries and leverages search engines to collect location-specific, everyday information. It has been evaluated across 39 locations in 24 countries and in 7 languages, ranging from extremely low-resource to high-resource languages, which resulted over 300K Question Answer (QA) pairs. The developed resources can be used for LLM benchmarking and further fine-tuning. The framework has been made publicly available for the community (https://gitlab.com/nativqa/nativqa-framework). △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: LLMs, Native, Multilingual, Language Diversity, Contextual Understanding, Minority Languages, Culturally Informed, Foundation Models, Large Language Models

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

arXiv:2504.04556 [pdf, other]

Online Facility Assignments on Polygons

Authors: Sumaiya Malik, Reyan Ahmed, Md. Manzurul Hasan

Abstract: We study the online facility assignment problem on regular polygons, where all sides are of equal length. The influence of specific geometric settings has remained mostly unexplored, even though classical online facility assignment problems have mainly dealt with linear and general metric spaces. We fill this gap by considering the following four basic geometric settings: equilateral triangles, re… ▽ More We study the online facility assignment problem on regular polygons, where all sides are of equal length. The influence of specific geometric settings has remained mostly unexplored, even though classical online facility assignment problems have mainly dealt with linear and general metric spaces. We fill this gap by considering the following four basic geometric settings: equilateral triangles, rectangles, regular $n$-polygons, and circles. The facilities are situated at fixed positions on the boundary, and customers appear sequentially on the boundary. A customer needs to be assigned immediately without any information about future customer arrivals. We study a natural greedy algorithm. First, we study an equilateral triangle with three facilities at its corners; customers can appear anywhere on the boundary. We then analyze regular $n$-sided polygons, obtaining a competitive ratio of $2n-1$, showing that the algorithm performance degrades linearly with the number of corner points for polygons. For the circular configuration, the competitive ratio is $2n-1$ when the distance between two adjacent facilities is the same. And the competitive ratios are $n^2-n+1$ and $2^n - 1$ for varying distances linearly and exponentially respectively. Each facility has a fixed capacity proportional to the geometric configuration, and customers appear only along the boundary edges. Our results also show that simpler geometric configurations have more efficient performance bounds and that spacing facilities uniformly apart prevent worst-case scenarios. The findings have many practical implications because large networks of facilities are best partitioned into smaller and geometrically simple pieces to guarantee good overall performance. △ Less

Submitted 6 April, 2025; originally announced April 2025.

arXiv:2504.03984 [pdf]

Optimized Feature Selection and Neural Network-Based Classification of Motor Imagery Using EEG Signals

Authors: Muhammad Sudipto Siam Dip, Mohammod Abdul Motin, Md. Anik Hasan, Sumaiya Kabir

Abstract: Objective: Machine learning- and deep learning-based models have recently been employed in motor imagery intention classification from electroencephalogram (EEG) signals. Nevertheless, there is a limited understanding of feature selection to assist in identifying the most significant features in different spatial locations. Methods: This study proposes a feature selection technique using sequentia… ▽ More Objective: Machine learning- and deep learning-based models have recently been employed in motor imagery intention classification from electroencephalogram (EEG) signals. Nevertheless, there is a limited understanding of feature selection to assist in identifying the most significant features in different spatial locations. Methods: This study proposes a feature selection technique using sequential forward feature selection with support vector machines and feeding the selected features to deep neural networks to classify motor imagery intention using multi-channel EEG. Results: The proposed model was evaluated with a publicly available dataset and achieved an average accuracy of 79.70 percent with a standard deviation of 7.98 percent for classifying two motor imagery scenarios. Conclusions: These results demonstrate that our method effectively identifies the most informative and discriminative characteristics of neural activity at different spatial locations, offering potential for future prosthetics and brain-computer interface applications. Significance: This approach enhances model performance while identifying key spatial EEG features, advancing brain-computer interfaces and prosthetic systems. △ Less

Submitted 4 April, 2025; originally announced April 2025.

arXiv:2504.02045 [pdf, other]

WorldPrompter: Traversable Text-to-Scene Generation

Authors: Zhaoyang Zhang, Yannick Hold-Geoffroy, Miloš Hašan, Chen Ziwen, Fujun Luan, Julie Dorsey, Yiwei Hu

Abstract: Scene-level 3D generation is a challenging research topic, with most existing methods generating only partial scenes and offering limited navigational freedom. We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts. We leverage panoramic videos as an intermediate representation to model the 360° details of a scene. WorldPrompter incorporate… ▽ More Scene-level 3D generation is a challenging research topic, with most existing methods generating only partial scenes and offering limited navigational freedom. We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts. We leverage panoramic videos as an intermediate representation to model the 360° details of a scene. WorldPrompter incorporates a conditional 360° panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model achieves convincing view consistency across frames, enabling high-quality panoramic Gaussian splat reconstruction and facilitating traversal over an area of the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360° video generators and 3D scene generation models. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.00485 [pdf]

Stroke Disease Classification Using Machine Learning with Feature Selection Techniques

Authors: Mahade Hasan, Farhana Yasmin, Xue Yu

Abstract: Heart disease remains a leading cause of mortality and morbidity worldwide, necessitating the development of accurate and reliable predictive models to facilitate early detection and intervention. While state of the art work has focused on various machine learning approaches for predicting heart disease, but they could not able to achieve remarkable accuracy. In response to this need, we applied n… ▽ More Heart disease remains a leading cause of mortality and morbidity worldwide, necessitating the development of accurate and reliable predictive models to facilitate early detection and intervention. While state of the art work has focused on various machine learning approaches for predicting heart disease, but they could not able to achieve remarkable accuracy. In response to this need, we applied nine machine learning algorithms XGBoost, logistic regression, decision tree, random forest, k-nearest neighbors (KNN), support vector machine (SVM), gaussian naïve bayes (NB gaussian), adaptive boosting, and linear regression to predict heart disease based on a range of physiological indicators. Our approach involved feature selection techniques to identify the most relevant predictors, aimed at refining the models to enhance both performance and interpretability. The models were trained, incorporating processes such as grid search hyperparameter tuning, and cross-validation to minimize overfitting. Additionally, we have developed a novel voting system with feature selection techniques to advance heart disease classification. Furthermore, we have evaluated the models using key performance metrics including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC AUC). Among the models, XGBoost demonstrated exceptional performance, achieving 99% accuracy, precision, F1-Score, 98% recall, and 100% ROC AUC. This study offers a promising approach to early heart disease diagnosis and preventive healthcare. △ Less

Submitted 21 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

arXiv:2503.24349 [pdf, other]

Faster Releases, Fewer Risks: A Study on Maven Artifact Vulnerabilities and Lifecycle Management

Authors: Md Shafiullah Shafin, Md Fazle Rabbi, S. M. Mahedy Hasan, Minhaz F. Zibran

Abstract: In modern software ecosystems, dependency management plays a critical role in ensuring secure and maintainable applications. However, understanding the relationship between release practices and their impact on vulnerabilities and update cycles remains a challenge. In this study, we analyze the release histories of 10,000 Maven artifacts, covering over 203,000 releases and 1.7 million dependencies… ▽ More In modern software ecosystems, dependency management plays a critical role in ensuring secure and maintainable applications. However, understanding the relationship between release practices and their impact on vulnerabilities and update cycles remains a challenge. In this study, we analyze the release histories of 10,000 Maven artifacts, covering over 203,000 releases and 1.7 million dependencies. We evaluate how release speed affects software security and lifecycle. Our results show an inverse relationship between release speed and dependency outdatedness. Artifacts with more frequent releases maintain significantly shorter outdated times. We also find that faster release cycles are linked to fewer CVEs in dependency chains, indicating a strong negative correlation. These findings emphasize the importance of accelerated release strategies in reducing security risks and ensuring timely updates. Our research provides valuable insights for software developers, maintainers, and ecosystem managers. △ Less

Submitted 31 March, 2025; originally announced March 2025.

arXiv:2503.23764 [pdf, other]

WaveFormer: A 3D Transformer with Wavelet-Driven Feature Representation for Efficient Medical Image Segmentation

Authors: Md Mahfuz Al Hasan, Mahdi Zaman, Abdul Jawad, Alberto Santamaria-Pang, Ho Hin Lee, Ivan Tarapov, Kyle See, Md Shah Imran, Antika Roy, Yaser Pourmohammadi Fallah, Navid Asadizanjani, Reza Forghani

Abstract: Transformer-based architectures have advanced medical image analysis by effectively modeling long-range dependencies, yet they often struggle in 3D settings due to substantial memory overhead and insufficient capture of fine-grained local features. We address these limitations with WaveFormer, a novel 3D-transformer that: i) leverages the fundamental frequency-domain properties of features for con… ▽ More Transformer-based architectures have advanced medical image analysis by effectively modeling long-range dependencies, yet they often struggle in 3D settings due to substantial memory overhead and insufficient capture of fine-grained local features. We address these limitations with WaveFormer, a novel 3D-transformer that: i) leverages the fundamental frequency-domain properties of features for contextual representation, and ii) is inspired by the top-down mechanism of the human visual recognition system, making it a biologically motivated architecture. By employing discrete wavelet transformations (DWT) at multiple scales, WaveFormer preserves both global context and high-frequency details while replacing heavy upsampling layers with efficient wavelet-based summarization and reconstruction. This significantly reduces the number of parameters, which is critical for real-world deployment where computational resources and training times are constrained. Furthermore, the model is generic and easily adaptable to diverse applications. Evaluations on BraTS2023, FLARE2021, and KiTS2023 demonstrate performance on par with state-of-the-art methods while offering substantially lower computational complexity. △ Less

Submitted 31 March, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

arXiv:2503.22902 [pdf, other]

Insights into Dependency Maintenance Trends in the Maven Ecosystem

Authors: Barisha Chowdhury, Md Fazle Rabbi, S. M. Mahedy Hasan, Minhaz F. Zibran

Abstract: As modern software development increasingly relies on reusable libraries and components, managing dependencies has become critical for ensuring software stability and security. However, challenges such as outdated dependencies, missed releases, and the complexity of interdependent libraries can significantly impact project maintenance. In this paper, we present a quantitative analysis of the Neo4j… ▽ More As modern software development increasingly relies on reusable libraries and components, managing dependencies has become critical for ensuring software stability and security. However, challenges such as outdated dependencies, missed releases, and the complexity of interdependent libraries can significantly impact project maintenance. In this paper, we present a quantitative analysis of the Neo4j dataset using the Goblin framework to uncover patterns of freshness in projects with different numbers of dependencies. Our analysis reveals that releases with fewer dependencies have a higher number of missed releases. Additionally, our study shows that the dependencies in the latest releases have positive freshness scores, indicating better software management efficacy. These results can encourage better management practices and contribute to the overall health of software ecosystems. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2503.22710 [pdf]

doi 10.63125/fh49gz18

Assessing the influence of cybersecurity threats and risks on the adoption and growth of digital banking: a systematic literature review

Authors: Md. Waliullah, Md Zahin Hossain George, Md Tarek Hasan, Md Khorshed Alam, Mosa Sumaiya Khatun Munira, Noor Alam Siddiqui

Abstract: The rapid digitalization of banking services has significantly transformed financial transactions, offering enhanced convenience and efficiency for consumers. However, the increasing reliance on digital banking has also exposed financial institutions and users to a wide range of cybersecurity threats, including phishing, malware, ransomware, data breaches, and unauthorized access. This study syste… ▽ More The rapid digitalization of banking services has significantly transformed financial transactions, offering enhanced convenience and efficiency for consumers. However, the increasing reliance on digital banking has also exposed financial institutions and users to a wide range of cybersecurity threats, including phishing, malware, ransomware, data breaches, and unauthorized access. This study systematically examines the influence of cybersecurity threats on digital banking security, adoption, and regulatory compliance by conducting a comprehensive review of 78 peer-reviewed articles published between 2015 and 2024. Using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology, this research critically evaluates the most prevalent cyber threats targeting digital banking platforms, the effectiveness of modern security measures, and the role of regulatory frameworks in mitigating financial cybersecurity risks. The findings reveal that phishing and malware attacks remain the most commonly exploited cyber threats, leading to significant financial losses and consumer distrust. Multi-factor authentication (MFA) and biometric security have been widely adopted to combat unauthorized access, while AI-driven fraud detection and blockchain technology offer promising solutions for securing financial transactions. However, the integration of third-party FinTech solutions introduces additional security risks, necessitating stringent regulatory oversight and cybersecurity protocols. The study also highlights that compliance with global cybersecurity regulations, such as GDPR, PSD2, and GLBA, enhances digital banking security by enforcing strict authentication measures, encryption protocols, and real-time fraud monitoring. △ Less

Submitted 22 March, 2025; originally announced March 2025.

Comments: 32 pages, 13 figures

arXiv:2503.21617 [pdf, other]

doi 10.1109/ICMLA61862.2024.00082

Leveraging Language Models for Analyzing Longitudinal Experiential Data in Education

Authors: Ahatsham Hayat, Bilal Khan, Mohammad Rashedul Hasan

Abstract: We propose a novel approach to leveraging pre-trained language models (LMs) for early forecasting of academic trajectories in STEM students using high-dimensional longitudinal experiential data. This data, which captures students' study-related activities, behaviors, and psychological states, offers valuable insights for forecasting-based interventions. Key challenges in handling such data include… ▽ More We propose a novel approach to leveraging pre-trained language models (LMs) for early forecasting of academic trajectories in STEM students using high-dimensional longitudinal experiential data. This data, which captures students' study-related activities, behaviors, and psychological states, offers valuable insights for forecasting-based interventions. Key challenges in handling such data include high rates of missing values, limited dataset size due to costly data collection, and complex temporal variability across modalities. Our approach addresses these issues through a comprehensive data enrichment process, integrating strategies for managing missing values, augmenting data, and embedding task-specific instructions and contextual cues to enhance the models' capacity for learning temporal patterns. Through extensive experiments on a curated student learning dataset, we evaluate both encoder-decoder and decoder-only LMs. While our findings show that LMs effectively integrate data across modalities and exhibit resilience to missing data, they primarily rely on high-level statistical patterns rather than demonstrating a deeper understanding of temporal dynamics. Furthermore, their ability to interpret explicit temporal information remains limited. This work advances educational data science by highlighting both the potential and limitations of LMs in modeling student trajectories for early intervention based on longitudinal experiential data. △ Less

Submitted 27 March, 2025; originally announced March 2025.

arXiv:2503.21020 [pdf, other]

Nonlocality-enabled inverse design of Dirac-type and higher-order degeneracies for traveling and evanescent waves in phononic crystals

Authors: Sharat Paul, Md Nahid Hasan, Pai Wang

Abstract: We propose complete tailoring procedures with analytical precision for band degeneracies in one-dimensional (1D) nonlocal phononic crystals, focusing on the role of beyond-nearest-neighbor (BNN) interactions. Unlike trivial Dirac cones at either the center or boundary of Brillouin zone (BZ), we demonstrate non-trivial Dirac-type and higher-order band crossings at any desirable wave number within t… ▽ More We propose complete tailoring procedures with analytical precision for band degeneracies in one-dimensional (1D) nonlocal phononic crystals, focusing on the role of beyond-nearest-neighbor (BNN) interactions. Unlike trivial Dirac cones at either the center or boundary of Brillouin zone (BZ), we demonstrate non-trivial Dirac-type and higher-order band crossings at any desirable wave number within the BZ by tuning BNN interactions. Our analyses show that odd-indexed BNN interactions determine the quantity and wave number of degeneracy points, while even-indexed BNN interactions primarily affect the frequency. Moreover, we discover new evanescent wave modes and associated degeneracies in the complex-valued wave number. In addition, we study a varieties of spatial-temporal response patterns in time-domain simulations for the interplay between traveling and localized modes at the propagating and evanescent degeneracies. △ Less

Submitted 26 March, 2025; originally announced March 2025.

arXiv:2503.15442 [pdf, other]

A Space-Efficient Algorithm for Longest Common Almost Increasing Subsequence of Two Sequences

Authors: Md Tanzeem Rahat, Md. Manzurul Hasan, Debajyoti Mondal

Abstract: Let $A$ and $B$ be two number sequences of length $n$ and $m$, respectively, where $m\le n$. Given a positive number $δ$, a common almost increasing sequence $s_1\ldots s_k$ is a common subsequence for both $A$ and $B$ such that for all $2\le i\le k$, $s_i+δ> \max_{1\le j < i} s_j$. The LCaIS problem seeks to find the longest common almost increasing subsequence (LCaIS) of $A$ and $B$. An LCaIS ca… ▽ More Let $A$ and $B$ be two number sequences of length $n$ and $m$, respectively, where $m\le n$. Given a positive number $δ$, a common almost increasing sequence $s_1\ldots s_k$ is a common subsequence for both $A$ and $B$ such that for all $2\le i\le k$, $s_i+δ> \max_{1\le j < i} s_j$. The LCaIS problem seeks to find the longest common almost increasing subsequence (LCaIS) of $A$ and $B$. An LCaIS can be computed in $O(nm\ell)$ time and $O(nm)$ space [Ta, Shieh, Lu (TCS 2021)], where $\ell$ is the length of the LCaIS of $A$ and $B$. In this paper we first give an $O(nm\ell)$-time and $O(n+m\ell)$-space algorithm to find LCaIS, which improves the space complexity. We then design an $O((n+m)\log n +\mathcal{M}\log \mathcal{M} + \mathcal{C}\ell)$-time and $O(\mathcal{M}(\ell+\log \mathcal{M}))$-space algorithm, which is faster when the number of matching pairs $\mathcal{M}$ and the number of compatible matching pairs $\mathcal{C}$ are in $o(nm/\log m)$. △ Less

Submitted 7 May, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

MSC Class: 68W01; 68R01 ACM Class: G.2.1

arXiv:2503.14556 [pdf]

doi 10.62754/joe.v4i2.6610

Designing and Deploying AI Models for Sustainable Logistics Optimization: A Case Study on Eco-Efficient Supply Chains in the USA

Authors: Reza E Rabbi Shawon, MD Rokibul Hasan, Md Anisur Rahman, Mohamed Ghandri, Iman Ahmed Lamari, Mohammed Kawsar, Rubi Akter

Abstract: The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly transformed logistics and supply chain management, particularly in the pursuit of sustainability and eco-efficiency. This study explores AI-based methodologies for optimizing logistics operations in the USA, focusing on reducing environmental impact, improving fuel efficiency, and minimizing costs. Key… ▽ More The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly transformed logistics and supply chain management, particularly in the pursuit of sustainability and eco-efficiency. This study explores AI-based methodologies for optimizing logistics operations in the USA, focusing on reducing environmental impact, improving fuel efficiency, and minimizing costs. Key AI applications include predictive analytics for demand forecasting, route optimization through machine learning, and AI-powered fuel efficiency strategies. Various models, such as Linear Regression, XGBoost, Support Vector Machine, and Neural Networks, are applied to real-world logistics datasets to reduce carbon emissions based on logistics operations, optimize travel routes to minimize distance and travel time, and predict future deliveries to plan optimal routes. Other models such as K-Means and DBSCAN are also used to optimize travel routes to minimize distance and travel time for logistics operations. This study utilizes datasets from logistics companies' databases. The study also assesses model performance using metrics such as mean absolute error (MAE), mean squared error (MSE), and R2 score. This study also explores how these models can be deployed to various platforms for real-time logistics and supply chain use. The models are also examined through a thorough case study, highlighting best practices and regulatory frameworks that promote sustainability. The findings demonstrate AI's potential to enhance logistics efficiency, reduce carbon footprints, and contribute to a more resilient and adaptive supply chain ecosystem. △ Less

Submitted 17 March, 2025; originally announced March 2025.

Showing 1–50 of 956 results for author: Hašan, M