-
Patient Similarity Computation for Clinical Decision Support: An Efficient Use of Data Transformation, Combining Static and Time Series Data
Authors:
Joydeb Kumar Sana,
Mohammad M. Masud,
M Sohel Rahman,
M Saifur Rahman
Abstract:
Patient similarity computation (PSC) is a fundamental problem in healthcare informatics. The aim of the patient similarity computation is to measure the similarity among patients according to their historical clinical records, which helps to improve clinical decision support. This paper presents a novel distributed patient similarity computation (DPSC) technique based on data transformation (DT) m…
▽ More
Patient similarity computation (PSC) is a fundamental problem in healthcare informatics. The aim of the patient similarity computation is to measure the similarity among patients according to their historical clinical records, which helps to improve clinical decision support. This paper presents a novel distributed patient similarity computation (DPSC) technique based on data transformation (DT) methods, utilizing an effective combination of time series and static data. Time series data are sensor-collected patients' information, including metrics like heart rate, blood pressure, Oxygen saturation, respiration, etc. The static data are mainly patient background and demographic data, including age, weight, height, gender, etc. Static data has been used for clustering the patients. Before feeding the static data to the machine learning model adaptive Weight-of-Evidence (aWOE) and Z-score data transformation (DT) methods have been performed, which improve the prediction performances. In aWOE-based patient similarity models, sensitive patient information has been processed using aWOE which preserves the data privacy of the trained models. We used the Dynamic Time Warping (DTW) approach, which is robust and very popular, for time series similarity. However, DTW is not suitable for big data due to the significant computational run-time. To overcome this problem, distributed DTW computation is used in this study. For Coronary Artery Disease, our DT based approach boosts prediction performance by as much as 11.4%, 10.20%, and 12.6% in terms of AUC, accuracy, and F-measure, respectively. In the case of Congestive Heart Failure (CHF), our proposed method achieves performance enhancement up to 15.9%, 10.5%, and 21.9% for the same measures, respectively. The proposed method reduces the computation time by as high as 40%.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Inceptive Transformers: Enhancing Contextual Representations through Multi-Scale Feature Learning Across Domains and Languages
Authors:
Asif Shahriar,
Rifat Shahriyar,
M Saifur Rahman
Abstract:
Conventional transformer models typically compress the information from all tokens in a sequence into a single \texttt{[CLS]} token to represent global context-- an approach that can lead to information loss in tasks requiring localized or hierarchical cues. In this work, we introduce \textit{Inceptive Transformer}, a modular and lightweight architecture that enriches transformer-based token repre…
▽ More
Conventional transformer models typically compress the information from all tokens in a sequence into a single \texttt{[CLS]} token to represent global context-- an approach that can lead to information loss in tasks requiring localized or hierarchical cues. In this work, we introduce \textit{Inceptive Transformer}, a modular and lightweight architecture that enriches transformer-based token representations by integrating a multi-scale feature extraction module inspired by inception networks. Our model is designed to balance local and global dependencies by dynamically weighting tokens based on their relevance to a particular task. Evaluation across a diverse range of tasks including emotion recognition (both English and Bangla), irony detection, disease identification, and anti-COVID vaccine tweets classification shows that our models consistently outperform the baselines by 1\% to 14\% while maintaining efficiency. These findings highlight the versatility and cross-lingual applicability of our method for enriching transformer-based representations across diverse domains.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
CrosGrpsABS: Cross-Attention over Syntactic and Semantic Graphs for Aspect-Based Sentiment Analysis in a Low-Resource Language
Authors:
Md. Mithun Hossain,
Md. Shakil Hossain,
Sudipto Chaki,
Md. Rajib Hossain,
Md. Saifur Rahman,
A. B. M. Shawkat Ali
Abstract:
Aspect-Based Sentiment Analysis (ABSA) is a fundamental task in natural language processing, offering fine-grained insights into opinions expressed in text. While existing research has largely focused on resource-rich languages like English which leveraging large annotated datasets, pre-trained models, and language-specific tools. These resources are often unavailable for low-resource languages su…
▽ More
Aspect-Based Sentiment Analysis (ABSA) is a fundamental task in natural language processing, offering fine-grained insights into opinions expressed in text. While existing research has largely focused on resource-rich languages like English which leveraging large annotated datasets, pre-trained models, and language-specific tools. These resources are often unavailable for low-resource languages such as Bengali. The ABSA task in Bengali remains poorly explored and is further complicated by its unique linguistic characteristics and a lack of annotated data, pre-trained models, and optimized hyperparameters. To address these challenges, this research propose CrosGrpsABS, a novel hybrid framework that leverages bidirectional cross-attention between syntactic and semantic graphs to enhance aspect-level sentiment classification. The CrosGrpsABS combines transformerbased contextual embeddings with graph convolutional networks, built upon rule-based syntactic dependency parsing and semantic similarity computations. By employing bidirectional crossattention, the model effectively fuses local syntactic structure with global semantic context, resulting in improved sentiment classification performance across both low- and high-resource settings. We evaluate CrosGrpsABS on four low-resource Bengali ABSA datasets and the high-resource English SemEval 2014 Task 4 dataset. The CrosGrpsABS consistently outperforms existing approaches, achieving notable improvements, including a 0.93% F1-score increase for the Restaurant domain and a 1.06% gain for the Laptop domain in the SemEval 2014 Task 4 benchmark.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
Authors:
Md Ahsanul Haque,
Ismail Hossain,
Md Mahmuduzzaman Kamol,
Md Jahangir Alam,
Suresh Kumar Amalapuram,
Sajedul Talukder,
Mohammad Saidur Rahman
Abstract:
Machine learning (ML)-based malware detection systems often fail to account for the dynamic nature of real-world training and test data distributions. In practice, these distributions evolve due to frequent changes in the Android ecosystem, adversarial development of new malware families, and the continuous emergence of both benign and malicious applications. Prior studies have shown that such con…
▽ More
Machine learning (ML)-based malware detection systems often fail to account for the dynamic nature of real-world training and test data distributions. In practice, these distributions evolve due to frequent changes in the Android ecosystem, adversarial development of new malware families, and the continuous emergence of both benign and malicious applications. Prior studies have shown that such concept drift -- distributional shifts in benign and malicious samples, leads to significant degradation in detection performance over time. Despite the practical importance of this issue, existing datasets are often outdated and limited in temporal scope, diversity of malware families, and sample scale, making them insufficient for the systematic evaluation of concept drift in malware detection.
To address this gap, we present LAMDA, the largest and most temporally diverse Android malware benchmark to date, designed specifically for concept drift analysis. LAMDA spans 12 years (2013-2025, excluding 2015), includes over 1 million samples (approximately 37% labeled as malware), and covers 1,380 malware families and 150,000 singleton samples, reflecting the natural distribution and evolution of real-world Android applications. We empirically demonstrate LAMDA's utility by quantifying the performance degradation of standard ML models over time and analyzing feature stability across years. As the most comprehensive Android malware dataset to date, LAMDA enables in-depth research into temporal drift, generalization, explainability, and evolving detection challenges. The dataset and code are available at: https://iqsec-lab.github.io/LAMDA/.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
LogStamping: A blockchain-based log auditing approach for large-scale systems
Authors:
Md Shariful Islam,
M. Sohel Rahman
Abstract:
Log management is crucial for ensuring the security, integrity, and compliance of modern information systems. Traditional log management solutions face challenges in achieving tamper-proofing, scalability, and real-time processing in distributed environments. This paper presents a blockchain-based log management framework that addresses these limitations by leveraging blockchain's decentralized, i…
▽ More
Log management is crucial for ensuring the security, integrity, and compliance of modern information systems. Traditional log management solutions face challenges in achieving tamper-proofing, scalability, and real-time processing in distributed environments. This paper presents a blockchain-based log management framework that addresses these limitations by leveraging blockchain's decentralized, immutable, and transparent features. The framework integrates a hybrid on-chain and off-chain storage model, combining blockchain's integrity guarantees with the scalability of distributed storage solutions like IPFS. Smart contracts automate log validation and access control, while cryptographic techniques ensure privacy and confidentiality. With a focus on real-time log processing, the framework is designed to handle the high-volume log generation typical in large-scale systems, such as data centers and network infrastructure. Performance evaluations demonstrate the framework's scalability, low latency, and ability to manage millions of log entries while maintaining strong security guarantees. Additionally, the paper discusses challenges like blockchain storage overhead and energy consumption, offering insights for enhancing future systems.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Entropy-Driven Genetic Optimization for Deep-Feature-Guided Low-Light Image Enhancement
Authors:
Nirjhor Datta,
Afroza Akther,
M. Sohel Rahman
Abstract:
Image enhancement methods often prioritize pixel level information, overlooking the semantic features. We propose a novel, unsupervised, fuzzy-inspired image enhancement framework guided by NSGA-II algorithm that optimizes image brightness, contrast, and gamma parameters to achieve a balance between visual quality and semantic fidelity. Central to our proposed method is the use of a pre trained de…
▽ More
Image enhancement methods often prioritize pixel level information, overlooking the semantic features. We propose a novel, unsupervised, fuzzy-inspired image enhancement framework guided by NSGA-II algorithm that optimizes image brightness, contrast, and gamma parameters to achieve a balance between visual quality and semantic fidelity. Central to our proposed method is the use of a pre trained deep neural network as a feature extractor. To find the best enhancement settings, we use a GPU-accelerated NSGA-II algorithm that balances multiple objectives, namely, increasing image entropy, improving perceptual similarity, and maintaining appropriate brightness. We further improve the results by applying a local search phase to fine-tune the top candidates from the genetic algorithm. Our approach operates entirely without paired training data making it broadly applicable across domains with limited or noisy labels. Quantitatively, our model achieves excellent performance with average BRISQUE and NIQE scores of 19.82 and 3.652, respectively, in all unpaired datasets. Qualitatively, enhanced images by our model exhibit significantly improved visibility in shadowed regions, natural balance of contrast and also preserve the richer fine detail without introducing noticable artifacts. This work opens new directions for unsupervised image enhancement where semantic consistency is critical.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Low latency FPGA implementation of twisted Edward curve cryptography hardware accelerator over prime field
Authors:
Md Rownak Hossain,
Md Sazedur Rahman,
Kh Shahriya Zaman,
Walid El Fezzani,
Mohammad Arif Sobhan Bhuiyan,
Chia Chao Kang,
Teh Jia Yew,
Mahdi H. Miraz
Abstract:
The performance of any elliptic curve cryptography hardware accelerator significantly relies on the efficiency of the underlying point multiplication (PM) architecture. This article presents a hardware implementation of field-programmable gate array (FPGA) based modular arithmetic, group operation, and point multiplication unit on the twisted Edwards curve (Edwards25519) over the 256-bit prime fie…
▽ More
The performance of any elliptic curve cryptography hardware accelerator significantly relies on the efficiency of the underlying point multiplication (PM) architecture. This article presents a hardware implementation of field-programmable gate array (FPGA) based modular arithmetic, group operation, and point multiplication unit on the twisted Edwards curve (Edwards25519) over the 256-bit prime field. An original hardware architecture of a unified point operation module in projective coordinates that executes point addition and point doubling within a single module has been developed, taking only 646 clock cycles and ensuring a better security level than conventional approaches. The proposed point multiplication module consumes 1.4 ms time, operating at a maximal clock frequency of 117.8 MHz utilising 164,730 clock cycles having 183.38 kbps throughput on the Xilinx Virtex-5 FPGA platform for 256-bit length of key. The comparative assessment of latency and throughput across various related recent works indicates the effectiveness of our proposed PM architecture. Finally, this high throughput and low latency PM architecture will be a good candidate for rapid data encryption in high-speed wireless communication networks.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Do Automatic Comment Generation Techniques Fall Short? Exploring the Influence of Method Dependencies on Code Understanding
Authors:
Md Mustakim Billah,
Md Shamimur Rahman,
Banani Roy
Abstract:
Method-level comments are critical for improving code comprehension and supporting software maintenance. With advancements in large language models (LLMs), automated comment generation has become a major research focus. However, existing approaches often overlook method dependencies, where one method relies on or calls others, affecting comment quality and code understandability. This study invest…
▽ More
Method-level comments are critical for improving code comprehension and supporting software maintenance. With advancements in large language models (LLMs), automated comment generation has become a major research focus. However, existing approaches often overlook method dependencies, where one method relies on or calls others, affecting comment quality and code understandability. This study investigates the prevalence and impact of dependent methods in software projects and introduces a dependency-aware approach for method-level comment generation. Analyzing a dataset of 10 popular Java GitHub projects, we found that dependent methods account for 69.25% of all methods and exhibit higher engagement and change proneness compared to independent methods. Across 448K dependent and 199K independent methods, we observed that state-of-the-art fine-tuned models (e.g., CodeT5+, CodeBERT) struggle to generate comprehensive comments for dependent methods, a trend also reflected in LLM-based approaches like ASAP. To address this, we propose HelpCOM, a novel dependency-aware technique that incorporates helper method information to improve comment clarity, comprehensiveness, and relevance. Experiments show that HelpCOM outperforms baseline methods by 5.6% to 50.4% across syntactic (e.g., BLEU), semantic (e.g., SentenceBERT), and LLM-based evaluation metrics. A survey of 156 software practitioners further confirms that HelpCOM significantly improves the comprehensibility of code involving dependent methods, highlighting its potential to enhance documentation, maintainability, and developer productivity in large-scale systems.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Machine Learning-Based Detection and Analysis of Suspicious Activities in Bitcoin Wallet Transactions in the USA
Authors:
Md Zahidul Islam,
Md Shahidul Islam,
Biswajit Chandra das,
Syed Ali Reza,
Proshanta Kumar Bhowmik,
Kanchon Kumar Bishnu,
Md Shafiqur Rahman,
Redoyan Chowdhury,
Laxmi Pant
Abstract:
The dramatic adoption of Bitcoin and other cryptocurrencies in the USA has revolutionized the financial landscape and provided unprecedented investment and transaction efficiency opportunities. The prime objective of this research project is to develop machine learning algorithms capable of effectively identifying and tracking suspicious activity in Bitcoin wallet transactions. With high-tech anal…
▽ More
The dramatic adoption of Bitcoin and other cryptocurrencies in the USA has revolutionized the financial landscape and provided unprecedented investment and transaction efficiency opportunities. The prime objective of this research project is to develop machine learning algorithms capable of effectively identifying and tracking suspicious activity in Bitcoin wallet transactions. With high-tech analysis, the study aims to create a model with a feature for identifying trends and outliers that can expose illicit activity. The current study specifically focuses on Bitcoin transaction information in America, with a strong emphasis placed on the importance of knowing about the immediate environment in and through which such transactions pass through. The dataset is composed of in-depth Bitcoin wallet transactional information, including important factors such as transaction values, timestamps, network flows, and addresses for wallets. All entries in the dataset expose information about financial transactions between wallets, including received and sent transactions, and such information is significant for analysis and trends that can represent suspicious activity. This study deployed three accredited algorithms, most notably, Logistic Regression, Random Forest, and Support Vector Machines. In retrospect, Random Forest emerged as the best model with the highest F1 Score, showcasing its ability to handle non-linear relationships in the data. Insights revealed significant patterns in wallet activity, such as the correlation between unredeemed transactions and final balances. The application of machine algorithms in tracking cryptocurrencies is a tool for creating transparent and secure U.S. markets.
△ Less
Submitted 3 April, 2025;
originally announced April 2025.
-
GroundHog: Revolutionizing GLDAS Groundwater Storage Downscaling for Enhanced Recharge Estimation in Bangladesh
Authors:
Saleh Sakib Ahmed,
Rashed Uz Zzaman,
Saifur Rahman Jony,
Faizur Rahman Himel,
Afroza Sharmin,
A. H. M. Khalequr Rahman,
M. Sohel Rahman,
Sara Nowreen
Abstract:
Long-term groundwater level (GWL) measurement is vital for effective policymaking and recharge estimation using annual maxima and minima. However, current methods prioritize short-term predictions and lack multi-year applicability, limiting their utility. Moreover, sparse in-situ measurements lead to reliance on low-resolution satellite data like GLDAS as the ground truth for Machine Learning mode…
▽ More
Long-term groundwater level (GWL) measurement is vital for effective policymaking and recharge estimation using annual maxima and minima. However, current methods prioritize short-term predictions and lack multi-year applicability, limiting their utility. Moreover, sparse in-situ measurements lead to reliance on low-resolution satellite data like GLDAS as the ground truth for Machine Learning models, further constraining accuracy. To overcome these challenges, we first develop an ML model to mitigate data gaps, achieving $R^2$ scores of 0.855 and 0.963 for maximum and minimum GWL predictions, respectively. Subsequently, using these predictions and well observations as ground truth, we train an Upsampling Model that uses low-resolution (25 km) GLDAS data as input to produce high-resolution (2 km) GWLs, achieving an excellent $R^2$ score of 0.96. Our approach successfully upscales GLDAS data for 2003-2024, allowing high-resolution recharge estimations and revealing critical trends for proactive resource management. Our method allows upsampling of groundwater storage (GWS) from GLDAS to high-resolution GWLs for any points independently of officially curated piezometer data, making it a valuable tool for decision-making.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data
Authors:
Md. Shaheenur Islam Sumon,
Md. Sakib Bin Islam,
Md. Sohanur Rahman,
Md. Sakib Abrar Hossain,
Amith Khandakar,
Anwarul Hasan,
M Murugappan,
Muhammad E. H. Chowdhury
Abstract:
The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within bi…
▽ More
The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within biological systems remain ambiguous in these spaces, highlighting the necessity of early detection and prediction of cardiovascular diseases to reduce the severe morbidity and mortality with these conditions worldwide. We handle this issue with CardioTabNet, which exploits the strength of tab transformer to extract feature space which carries strong understanding of clinical cardiovascular data and its feature ranking. As a result, performance of downstream classical models significantly showed outstanding result. Our study utilizes the open-source dataset for heart disease prediction with 1190 instances and 11 features. In total, 11 features are divided into numerical (age, resting blood pressure, cholesterol, maximum heart rate, old peak, weight, and fasting blood sugar) and categorical (resting ECG, exercise angina, and ST slope). Tab transformer was used to extract important features and ranked them using random forest (RF) feature ranking algorithm. Ten machine-learning models were used to predict heart disease using selected features. After extracting high-quality features, the top downstream model (a hyper-tuned ExtraTree classifier) achieved an average accuracy rate of 94.1% and an average Area Under Curve (AUC) of 95.0%. Furthermore, a nomogram analysis was conducted to evaluate the model's effectiveness in cardiovascular risk assessment. A benchmarking study was conducted using state-of-the-art models to evaluate our transformer-driven framework.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Universal point spread function engineering for 3D optical information processing
Authors:
Md Sadman Sakib Rahman,
Aydogan Ozcan
Abstract:
Point spread function (PSF) engineering has been pivotal in the remarkable progress made in high-resolution imaging in the last decades. However, the diversity in PSF structures attainable through existing engineering methods is limited. Here, we report universal PSF engineering, demonstrating a method to synthesize an arbitrary set of spatially varying 3D PSFs between the input and output volumes…
▽ More
Point spread function (PSF) engineering has been pivotal in the remarkable progress made in high-resolution imaging in the last decades. However, the diversity in PSF structures attainable through existing engineering methods is limited. Here, we report universal PSF engineering, demonstrating a method to synthesize an arbitrary set of spatially varying 3D PSFs between the input and output volumes of a spatially incoherent diffractive processor composed of cascaded transmissive surfaces. We rigorously analyze the PSF engineering capabilities of such diffractive processors within the diffraction limit of light and provide numerical demonstrations of unique imaging capabilities, such as snapshot 3D multispectral imaging without involving any spectral filters, axial scanning or digital reconstruction steps, which is enabled by the spatial and spectral engineering of 3D PSFs. Our framework and analysis would be important for future advancements in computational imaging, sensing and diffractive processing of 3D optical information.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
MADAR: Efficient Continual Learning for Malware Analysis with Diversity-Aware Replay
Authors:
Mohammad Saidur Rahman,
Scott Coull,
Qi Yu,
Matthew Wright
Abstract:
Millions of new pieces of malicious software (i.e., malware) are introduced each year. This poses significant challenges for antivirus vendors, who use machine learning to detect and analyze malware, and must keep up with changes in the distribution while retaining knowledge of older variants. Continual learning (CL) holds the potential to address this challenge by reducing the storage and computa…
▽ More
Millions of new pieces of malicious software (i.e., malware) are introduced each year. This poses significant challenges for antivirus vendors, who use machine learning to detect and analyze malware, and must keep up with changes in the distribution while retaining knowledge of older variants. Continual learning (CL) holds the potential to address this challenge by reducing the storage and computational costs of regularly retraining over all the collected data. Prior work, however, shows that CL techniques, which are designed primarily for computer vision tasks, fare poorly when applied to malware classification. To address these issues, we begin with an exploratory analysis of a typical malware dataset, which reveals that malware families are diverse and difficult to characterize, requiring a wide variety of samples to learn a robust representation. Based on these findings, we propose $\underline{M}$alware $\underline{A}$nalysis with $\underline{D}$iversity-$\underline{A}$ware $\underline{R}$eplay (MADAR), a CL framework that accounts for the unique properties and challenges of the malware data distribution. Through extensive evaluation on large-scale Windows and Android malware datasets, we show that MADAR significantly outperforms prior work. This highlights the importance of understanding domain characteristics when designing CL techniques and demonstrates a path forward for the malware classification domain.
△ Less
Submitted 8 February, 2025;
originally announced February 2025.
-
Breaking the Fake News Barrier: Deep Learning Approaches in Bangla Language
Authors:
Pronoy Kumar Mondal,
Sadman Sadik Khan,
Md. Masud Rana,
Shahriar Sultan Ramit,
Abdus Sattar,
Md. Sadekur Rahman
Abstract:
The rapid development of digital stages has greatly compounded the dispersal of untrue data, dissolving certainty and judgment in society, especially among the Bengali-speaking community. Our ponder addresses this critical issue by presenting an interesting strategy that utilizes a profound learning innovation, particularly the Gated Repetitive Unit (GRU), to recognize fake news within the Bangla…
▽ More
The rapid development of digital stages has greatly compounded the dispersal of untrue data, dissolving certainty and judgment in society, especially among the Bengali-speaking community. Our ponder addresses this critical issue by presenting an interesting strategy that utilizes a profound learning innovation, particularly the Gated Repetitive Unit (GRU), to recognize fake news within the Bangla dialect. The strategy of our proposed work incorporates intensive information preprocessing, which includes lemmatization, tokenization, and tending to course awkward nature by oversampling. This comes about in a dataset containing 58,478 passages. We appreciate the creation of a demonstration based on GRU (Gated Repetitive Unit) that illustrates remarkable execution with a noteworthy precision rate of 94%. This ponder gives an intensive clarification of the methods included in planning the information, selecting the show, preparing it, and assessing its execution. The performance of the model is investigated by reliable metrics like precision, recall, F1 score, and accuracy. The commitment of the work incorporates making a huge fake news dataset in Bangla and a demonstration that has outperformed other Bangla fake news location models.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
DFCon: Attention-Driven Supervised Contrastive Learning for Robust Deepfake Detection
Authors:
MD Sadik Hossain Shanto,
Mahir Labib Dihan,
Souvik Ghosh,
Riad Ahmed Anonto,
Hafijul Hoque Chowdhury,
Abir Muhtasim,
Rakib Ahsan,
MD Tanvir Hassan,
MD Roqunuzzaman Sojib,
Sheikh Azizul Hakim,
M. Saifur Rahman
Abstract:
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup), focusing on detecting deepfakes across diverse datasets. Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation. These models were specifically chosen for their complementary streng…
▽ More
This report presents our approach for the IEEE SP Cup 2025: Deepfake Face Detection in the Wild (DFWild-Cup), focusing on detecting deepfakes across diverse datasets. Our methodology employs advanced backbone models, including MaxViT, CoAtNet, and EVA-02, fine-tuned using supervised contrastive loss to enhance feature separation. These models were specifically chosen for their complementary strengths. Integration of convolution layers and strided attention in MaxViT is well-suited for detecting local features. In contrast, hybrid use of convolution and attention mechanisms in CoAtNet effectively captures multi-scale features. Robust pretraining with masked image modeling of EVA-02 excels at capturing global features. After training, we freeze the parameters of these models and train the classification heads. Finally, a majority voting ensemble is employed to combine the predictions from these models, improving robustness and generalization to unseen scenarios. The proposed system addresses the challenges of detecting deepfakes in real-world conditions and achieves a commendable accuracy of 95.83% on the validation dataset.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
A light-weight model to generate NDWI from Sentinel-1
Authors:
Saleh Sakib Ahmed,
Saifur Rahman Jony,
Md. Toufikuzzaman,
Saifullah Sayed,
Rashed Uz Zzaman,
Sara Nowreen,
M. Sohel Rahman
Abstract:
The use of Sentinel-2 images to compute Normalized Difference Water Index (NDWI) has many applications, including water body area detection. However, cloud cover poses significant challenges in this regard, which hampers the effectiveness of Sentinel-2 images in this context. In this paper, we present a deep learning model that can generate NDWI given Sentinel-1 images, thereby overcoming this clo…
▽ More
The use of Sentinel-2 images to compute Normalized Difference Water Index (NDWI) has many applications, including water body area detection. However, cloud cover poses significant challenges in this regard, which hampers the effectiveness of Sentinel-2 images in this context. In this paper, we present a deep learning model that can generate NDWI given Sentinel-1 images, thereby overcoming this cloud barrier. We show the effectiveness of our model, where it demonstrates a high accuracy of 0.9134 and an AUC of 0.8656 to predict the NDWI. Additionally, we observe promising results with an R2 score of 0.4984 (for regressing the NDWI values) and a Mean IoU of 0.4139 (for the underlying segmentation task). In conclusion, our model offers a first and robust solution for generating NDWI images directly from Sentinel-1 images and subsequent use for various applications even under challenging conditions such as cloud cover and nighttime.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Empowering Agricultural Insights: RiceLeafBD -- A Novel Dataset and Optimal Model Selection for Rice Leaf Disease Diagnosis through Transfer Learning Technique
Authors:
Sadia Afrin Rimi,
Md. Jalal Uddin Chowdhury,
Rifat Abdullah,
Iftekhar Ahmed,
Mahrima Akter Mim,
Mohammad Shoaib Rahman
Abstract:
The number of people living in this agricultural nation of ours, which is surrounded by lush greenery, is growing on a daily basis. As a result of this, the level of arable land is decreasing, as well as residential houses and industrial factories. The food crisis is becoming the main threat for us in the upcoming days. Because on the one hand, the population is increasing, and on the other hand,…
▽ More
The number of people living in this agricultural nation of ours, which is surrounded by lush greenery, is growing on a daily basis. As a result of this, the level of arable land is decreasing, as well as residential houses and industrial factories. The food crisis is becoming the main threat for us in the upcoming days. Because on the one hand, the population is increasing, and on the other hand, the amount of food crop production is decreasing due to the attack of diseases. Rice is one of the most significant cultivated crops since it provides food for more than half of the world's population. Bangladesh is dependent on rice (Oryza sativa) as a vital crop for its agriculture, but it faces a significant problem as a result of the ongoing decline in rice yield brought on by common diseases. Early disease detection is the main difficulty in rice crop cultivation. In this paper, we proposed our own dataset, which was collected from the Bangladesh field, and also applied deep learning and transfer learning models for the evaluation of the datasets. We elaborately explain our dataset and also give direction for further research work to serve society using this dataset. We applied a light CNN model and pre-trained InceptionNet-V2, EfficientNet-V2, and MobileNet-V2 models, which achieved 91.5% performance for the EfficientNet-V2 model of this work. The results obtained assaulted other models and even exceeded approaches that are considered to be part of the state of the art. It has been demonstrated by this study that it is possible to precisely and effectively identify diseases that affect rice leaves using this unbiased datasets. After analysis of the performance of different models, the proposed datasets are significant for the society for research work to provide solutions for decreasing rice leaf disease.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
Strategic Fusion Optimizes Transformer Compression
Authors:
Md Shoaibur Rahman
Abstract:
This study investigates transformer model compression by systematically pruning its layers. We evaluated 14 pruning strategies across nine diverse datasets, including 12 strategies based on different signals obtained from layer activations, mutual information, gradients, weights, and attention. To address the limitations of single-signal strategies, we introduced two fusion strategies, linear regr…
▽ More
This study investigates transformer model compression by systematically pruning its layers. We evaluated 14 pruning strategies across nine diverse datasets, including 12 strategies based on different signals obtained from layer activations, mutual information, gradients, weights, and attention. To address the limitations of single-signal strategies, we introduced two fusion strategies, linear regression and random forest, which combine individual strategies (i.e., strategic fusion), for more informed pruning decisions. Additionally, we applied knowledge distillation to mitigate any accuracy loss during layer pruning. Our results reveal that random forest strategic fusion outperforms individual strategies in seven out of nine datasets and achieves near-optimal performance in the other two. The distilled random forest surpasses the original accuracy in six datasets and mitigates accuracy drops in the remaining three. Knowledge distillation also improves the accuracy-to-size ratio by an average factor of 18.84 across all datasets. Supported by mathematical foundations and biological analogies, our findings suggest that strategically combining multiple signals can lead to efficient, high-performing transformer models for resource-constrained applications.
△ Less
Submitted 4 January, 2025;
originally announced January 2025.
-
MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification
Authors:
Jimin Park,
AHyun Ji,
Minji Park,
Mohammad Saidur Rahman,
Se Eun Oh
Abstract:
Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with…
▽ More
Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with catastrophic forgetting, where a model's performance on old data degrades over time.
In this paper, we introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples. Additionally, we implement innovative selection schemes for replay samples based on the model's hidden representations.
Our comprehensive evaluation across Windows and Android malware datasets in a class-incremental learning scenario -- where new classes are introduced continuously over multiple tasks -- demonstrates substantial performance improvements over previous methods. For example, our system achieves an average accuracy of 55% on Windows malware samples, significantly outperforming other GR-based models by 28%. This study provides practical insights for advancing GR-based malware classification systems. The implementation is available at \url {https://github.com/MalwareReplayGAN/MalCL}\footnote{The code will be made public upon the presentation of the paper}.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
A Novel Ensemble-Based Deep Learning Model with Explainable AI for Accurate Kidney Disease Diagnosis
Authors:
Md. Arifuzzaman,
Iftekhar Ahmed,
Md. Jalal Uddin Chowdhury,
Shadman Sakib,
Mohammad Shoaib Rahman,
Md. Ebrahim Hossain,
Shakib Absar
Abstract:
Chronic Kidney Disease (CKD) represents a significant global health challenge, characterized by the progressive decline in renal function, leading to the accumulation of waste products and disruptions in fluid balance within the body. Given its pervasive impact on public health, there is a pressing need for effective diagnostic tools to enable timely intervention. Our study delves into the applica…
▽ More
Chronic Kidney Disease (CKD) represents a significant global health challenge, characterized by the progressive decline in renal function, leading to the accumulation of waste products and disruptions in fluid balance within the body. Given its pervasive impact on public health, there is a pressing need for effective diagnostic tools to enable timely intervention. Our study delves into the application of cutting-edge transfer learning models for the early detection of CKD. Leveraging a comprehensive and publicly available dataset, we meticulously evaluate the performance of several state-of-the-art models, including EfficientNetV2, InceptionNetV2, MobileNetV2, and the Vision Transformer (ViT) technique. Remarkably, our analysis demonstrates superior accuracy rates, surpassing the 90% threshold with MobileNetV2 and achieving 91.5% accuracy with ViT. Moreover, to enhance predictive capabilities further, we integrate these individual methodologies through ensemble modeling, resulting in our ensemble model exhibiting a remarkable 96% accuracy in the early detection of CKD. This significant advancement holds immense promise for improving clinical outcomes and underscores the critical role of machine learning in addressing complex medical challenges.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
Automated Toll Management System Using RFID and Image Processing
Authors:
Raihan Ahmed,
Shahed Chowdhury Omi,
Md. Sadman Rahman,
Niaz Rahman Bhuiyan
Abstract:
Traveling through toll plazas is one of the primary causes of congestion, as identified in recent studies. Electronic Toll Collection (ETC) systems can mitigate this problem. This experiment focuses on enhancing the security of ETC using RFID tags and number plate verification. For number plate verification, image processing is employed, and a CNN classifier is implemented to detect vehicle regist…
▽ More
Traveling through toll plazas is one of the primary causes of congestion, as identified in recent studies. Electronic Toll Collection (ETC) systems can mitigate this problem. This experiment focuses on enhancing the security of ETC using RFID tags and number plate verification. For number plate verification, image processing is employed, and a CNN classifier is implemented to detect vehicle registration numbers. Based on the registered number, a notification email is sent to the respective owner for toll fee payment within a specific timeframe to avoid fines. Additionally, toll fees are automatically deducted in real-time from the owner's balance. This system benefits travelers by eliminating the need to queue for toll payment, thereby reducing delays and improving convenience.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features
Authors:
MD Shaikh Rahman,
Syed Maudud E Rabbi,
Muhammad Mahbubur Rashid
Abstract:
Approximate Nearest Neighbor search is one of the keys to high-scale data retrieval performance in many applications. The work is a bridge between feature extraction and ANN indexing through fine-tuning a ResNet50 model with various ANN methods: FAISS and Annoy. We evaluate the systems with respect to indexing time, memory usage, query time, precision, recall, F1-score, and Recall@5 on a custom im…
▽ More
Approximate Nearest Neighbor search is one of the keys to high-scale data retrieval performance in many applications. The work is a bridge between feature extraction and ANN indexing through fine-tuning a ResNet50 model with various ANN methods: FAISS and Annoy. We evaluate the systems with respect to indexing time, memory usage, query time, precision, recall, F1-score, and Recall@5 on a custom image dataset. FAISS's Product Quantization can achieve a precision of 98.40% with low memory usage at 0.24 MB index size, and Annoy is the fastest, with average query times of 0.00015 seconds, at a slight cost to accuracy. These results reveal trade-offs among speed, accuracy, and memory efficiency and offer actionable insights into the optimization of feature-based image retrieval systems. This study will serve as a blueprint for constructing actual retrieval pipelines and be built on fine-tuned deep learning networks and associated ANN methods.
△ Less
Submitted 2 December, 2024;
originally announced December 2024.
-
Algorithms for Parameterized String Matching with Mismatches
Authors:
Apurba Saha,
Iftekhar Hakim Kaowsar,
Mahdi Hasnat Siyam,
M. Sohel Rahman
Abstract:
Two strings are considered to have parameterized matching when there exists a bijection of the parameterized alphabet onto itself such that it transforms one string to another. Parameterized matching has application in software duplication detection, image processing, and computational biology. We consider the problem for which a pattern $p$, a text $t$ and a mismatch tolerance limit $k$ is given…
▽ More
Two strings are considered to have parameterized matching when there exists a bijection of the parameterized alphabet onto itself such that it transforms one string to another. Parameterized matching has application in software duplication detection, image processing, and computational biology. We consider the problem for which a pattern $p$, a text $t$ and a mismatch tolerance limit $k$ is given and the goal is to find all positions in text $t$, for which pattern $p$, parameterized matches with $|p|$ length substrings of $t$ with at most $k$ mismatches. Our main result is an algorithm for this problem with $O(α^2 n\log n + n α^2 \sqrtα \log \left( n α\right))$ time complexity, where $n = |t|$ and $α= |Σ|$ which is improving for $k=\tildeΩ(|Σ|^{5/3})$ the algorithm by Hazay, Lewenstein and Sokol. We also present a hashing based probabilistic algorithm for this problem when $k = 1$ with $O \left( n \log n \right)$ time complexity, which we believe is algorithmically beautiful.
△ Less
Submitted 29 November, 2024;
originally announced December 2024.
-
Efficient Medical Image Retrieval Using DenseNet and FAISS for BIRADS Classification
Authors:
MD Shaikh Rahman,
Feiroz Humayara,
Syed Maudud E Rabbi,
Muhammad Mahbubur Rashid
Abstract:
That datasets that are used in todays research are especially vast in the medical field. Different types of medical images such as X-rays, MRI, CT scan etc. take up large amounts of space. This volume of data introduces challenges like accessing and retrieving specific images due to the size of the database. An efficient image retrieval system is essential as the database continues to grow to save…
▽ More
That datasets that are used in todays research are especially vast in the medical field. Different types of medical images such as X-rays, MRI, CT scan etc. take up large amounts of space. This volume of data introduces challenges like accessing and retrieving specific images due to the size of the database. An efficient image retrieval system is essential as the database continues to grow to save time and resources. In this paper, we propose an approach to medical image retrieval using DenseNet for feature extraction and use FAISS for similarity search. DenseNet is well-suited for feature extraction in complex medical images and FAISS enables efficient handling of high-dimensional data in large-scale datasets. Unlike existing methods focused solely on classification accuracy, our method prioritizes both retrieval speed and diagnostic relevance, addressing a critical gap in real-time case comparison for radiologists. We applied the classification of breast cancer images using the BIRADS system. We utilized DenseNet's powerful feature representation and FAISSs efficient indexing capabilities to achieve high precision and recall in retrieving relevant images for diagnosis. We experimented on a dataset of 2006 images from the Categorized Digital Database for Low Energy and Subtracted Contrast Enhanced Spectral Mammography (CDD-CESM) images available on The Cancer Imaging Archive (TCIA). Our method outperforms conventional retrieval techniques, achieving a precision of 80% at k=5 for BIRADS classification. The dataset includes annotated CESM images and medical reports, providing a comprehensive foundation for our research.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
Privacy-Preserving Customer Churn Prediction Model in the Context of Telecommunication Industry
Authors:
Joydeb Kumar Sana,
M Sohel Rahman,
M Saifur Rahman
Abstract:
Data is the main fuel of a successful machine learning model. A dataset may contain sensitive individual records e.g. personal health records, financial data, industrial information, etc. Training a model using this sensitive data has become a new privacy concern when someone uses third-party cloud computing. Trained models also suffer privacy attacks which leads to the leaking of sensitive inform…
▽ More
Data is the main fuel of a successful machine learning model. A dataset may contain sensitive individual records e.g. personal health records, financial data, industrial information, etc. Training a model using this sensitive data has become a new privacy concern when someone uses third-party cloud computing. Trained models also suffer privacy attacks which leads to the leaking of sensitive information of the training data. This study is conducted to preserve the privacy of training data in the context of customer churn prediction modeling for the telecommunications industry (TCI). In this work, we propose a framework for privacy-preserving customer churn prediction (PPCCP) model in the cloud environment. We have proposed a novel approach which is a combination of Generative Adversarial Networks (GANs) and adaptive Weight-of-Evidence (aWOE). Synthetic data is generated from GANs, and aWOE is applied on the synthetic training dataset before feeding the data to the classification algorithms. Our experiments were carried out using eight different machine learning (ML) classifiers on three openly accessible datasets from the telecommunication sector. We then evaluated the performance using six commonly employed evaluation metrics. In addition to presenting a data privacy analysis, we also performed a statistical significance test. The training and prediction processes achieve data privacy and the prediction classifiers achieve high prediction performance (87.1\% in terms of F-Measure for GANs-aWOE based Naïve Bayes model). In contrast to earlier studies, our suggested approach demonstrates a prediction enhancement of up to 28.9\% and 27.9\% in terms of accuracy and F-measure, respectively.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering
Authors:
Parvez Zamil,
Gollam Rabby,
Md. Sadekur Rahman,
Sören Auer
Abstract:
The growing volume of biomedical scholarly document abstracts presents an increasing challenge in efficiently retrieving accurate and relevant information. To address this, we introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization. Complementing this, we employ…
▽ More
The growing volume of biomedical scholarly document abstracts presents an increasing challenge in efficiently retrieving accurate and relevant information. To address this, we introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization. Complementing this, we employ the distilled MiniLM model, fine-tuned on domain-specific data, for high-precision answer extraction. Our approach is evaluated across three configurations: scholarly document abstract retrieval, gold-standard scholarly documents abstract, and gold-standard snippets, consistently outperforming established methods such as RYGH and bio-answer finder. Notably, we demonstrate that extracting answers from scholarly documents abstracts alone can yield high accuracy, underscoring the sufficiency of abstracts for many biomedical queries. Despite its compact size, MiniLM exhibits competitive performance, challenging the prevailing notion that only large, resource-intensive models can handle such complex tasks. Our results, validated across various question types and evaluation batches, highlight the robustness and adaptability of our method in real-world biomedical applications. While our approach shows promise, we identify challenges in handling complex list-type questions and inconsistencies in evaluation metrics. Future work will focus on refining the topic model with more extensive domain-specific datasets, further optimizing MiniLM and utilizing large language models (LLM) to improve both precision and efficiency in biomedical question answering.
△ Less
Submitted 29 October, 2024;
originally announced November 2024.
-
Self-DenseMobileNet: A Robust Framework for Lung Nodule Classification using Self-ONN and Stacking-based Meta-Classifier
Authors:
Md. Sohanur Rahman,
Muhammad E. H. Chowdhury,
Hasib Ryan Rahman,
Mosabber Uddin Ahmed,
Muhammad Ashad Kabir,
Sanjiban Sekhar Roy,
Rusab Sarmun
Abstract:
In this study, we propose a novel and robust framework, Self-DenseMobileNet, designed to enhance the classification of nodules and non-nodules in chest radiographs (CXRs). Our approach integrates advanced image standardization and enhancement techniques to optimize the input quality, thereby improving classification accuracy. To enhance predictive accuracy and leverage the strengths of multiple mo…
▽ More
In this study, we propose a novel and robust framework, Self-DenseMobileNet, designed to enhance the classification of nodules and non-nodules in chest radiographs (CXRs). Our approach integrates advanced image standardization and enhancement techniques to optimize the input quality, thereby improving classification accuracy. To enhance predictive accuracy and leverage the strengths of multiple models, the prediction probabilities from Self-DenseMobileNet were transformed into tabular data and used to train eight classical machine learning (ML) models; the top three performers were then combined via a stacking algorithm, creating a robust meta-classifier that integrates their collective insights for superior classification performance. To enhance the interpretability of our results, we employed class activation mapping (CAM) to visualize the decision-making process of the best-performing model. Our proposed framework demonstrated remarkable performance on internal validation data, achieving an accuracy of 99.28\% using a Meta-Random Forest Classifier. When tested on an external dataset, the framework maintained strong generalizability with an accuracy of 89.40\%. These results highlight a significant improvement in the classification of CXRs with lung nodules.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
A Census-Based Genetic Algorithm for Target Set Selection Problem in Social Networks
Authors:
Md. Samiur Rahman,
Mohammad Shamim Ahsan,
Cheng-Wu Chen,
Vijayakumar Varadarajan
Abstract:
This paper considers the Target Set Selection (TSS) Problem in social networks, a fundamental problem in viral marketing. In the TSS problem, a graph and a threshold value for each vertex of the graph are given. We need to find a minimum size vertex subset to "activate" such that all graph vertices are activated at the end of the propagation process. Specifically, we propose a novel approach calle…
▽ More
This paper considers the Target Set Selection (TSS) Problem in social networks, a fundamental problem in viral marketing. In the TSS problem, a graph and a threshold value for each vertex of the graph are given. We need to find a minimum size vertex subset to "activate" such that all graph vertices are activated at the end of the propagation process. Specifically, we propose a novel approach called "a census-based genetic algorithm" for the TSS problem. In our algorithm, we use the idea of a census to gather and store information about each individual in a population and collect census data from the individuals constructed during the algorithm's execution so that we can achieve greater diversity and avoid premature convergence at locally optimal solutions. We use two distinct census information: (a) for each individual, the algorithm stores how many times it has been identified during the execution (b) for each network node, the algorithm counts how many times it has been included in a solution. The proposed algorithm can also self-adjust by using a parameter specifying the aggressiveness employed in each reproduction method. Additionally, the algorithm is designed to run in a parallelized environment to minimize the computational cost and check each individual's feasibility. Moreover, our algorithm finds the optimal solution in all cases while experimenting on random graphs. Furthermore, we execute the proposed algorithm on 14 large graphs of real-life social network instances from the literature, improving around 9.57 solution size (on average) and 134 vertices (in total) compared to the best solutions obtained in previous studies.
△ Less
Submitted 3 July, 2025; v1 submitted 2 October, 2024;
originally announced October 2024.
-
ConVerSum: A Contrastive Learning-based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents
Authors:
Sanzana Karim Lora,
M. Sohel Rahman,
Rifat Shahriyar
Abstract:
Cross-lingual summarization (CLS) is a sophisticated branch in Natural Language Processing that demands models to accurately translate and summarize articles from different source languages. Despite the improvement of the subsequent studies, This area still needs data-efficient solutions along with effective training methodologies. To the best of our knowledge, there is no feasible solution for CL…
▽ More
Cross-lingual summarization (CLS) is a sophisticated branch in Natural Language Processing that demands models to accurately translate and summarize articles from different source languages. Despite the improvement of the subsequent studies, This area still needs data-efficient solutions along with effective training methodologies. To the best of our knowledge, there is no feasible solution for CLS when there is no available high-quality CLS data. In this paper, we propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning, generating versatile candidate summaries in different languages based on the given source document and contrasting these summaries with reference summaries concerning the given documents. After that, we train the model with a contrastive ranking loss. Then, we rigorously evaluate the proposed approach against current methodologies and compare it to powerful Large Language Models (LLMs)- Gemini, GPT 3.5, and GPT 4o proving our model performs better for low-resource languages' CLS. These findings represent a substantial improvement in the area, opening the door to more efficient and accurate cross-lingual summarizing techniques.
△ Less
Submitted 25 November, 2024; v1 submitted 17 August, 2024;
originally announced August 2024.
-
Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease
Authors:
Nayeem Ahmed,
Md Maruf Rahman,
Md Fatin Ishrak,
Md Imran Kabir Joy,
Md Sanowar Hossain Sabuj,
Md. Sadekur Rahman
Abstract:
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. A carefully selected dataset of keratoconus, normal, and suspicious cases was used. The models tested include DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19. To maximize model training, bad sample removal, resizing, rescaling, and augmentation wer…
▽ More
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. A carefully selected dataset of keratoconus, normal, and suspicious cases was used. The models tested include DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19. To maximize model training, bad sample removal, resizing, rescaling, and augmentation were used. The models were trained with similar parameters, activation function, classification function, and optimizer to compare performance. To determine class separation effectiveness, each model was evaluated on accuracy, precision, recall, and F1-score. MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications. InceptionV3 and DenseNet121 both performed well in keratoconus detection, but they had trouble with questionable cases. In contrast, EfficientNetB0, ResNet50, and VGG19 had more difficulty distinguishing dubious cases from regular ones, indicating the need for model refining and development. A detailed comparison of state-of-the-art CNN architectures for automated keratoconus identification reveals each model's benefits and weaknesses. This study shows that advanced deep learning models can enhance keratoconus diagnosis and treatment planning. Future research should explore hybrid models and integrate clinical parameters to improve diagnostic accuracy and robustness in real-world clinical applications, paving the way for more effective AI-driven ophthalmology tools.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Unidirectional imaging with partially coherent light
Authors:
Guangdong Ma,
Che-Yung Shen,
Jingxi Li,
Luzhe Huang,
Cagatay Isil,
Fazil Onuralp Ardic,
Xilin Yang,
Yuhang Li,
Yuntian Wang,
Md Sadman Sakib Rahman,
Aydogan Ozcan
Abstract:
Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting th…
▽ More
Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting the image formation in the backward direction (B->A) along with low power efficiency. Our reciprocal design features a set of spatially engineered linear diffractive layers that are statistically optimized for partially coherent illumination with a given phase correlation length. Our analyses reveal that when illuminated by a partially coherent beam with a correlation length of ~1.5 w or larger, where w is the wavelength of light, diffractive unidirectional imagers achieve robust performance, exhibiting asymmetric imaging performance between the forward and backward directions - as desired. A partially coherent unidirectional imager designed with a smaller correlation length of less than 1.5 w still supports unidirectional image transmission, but with a reduced figure of merit. These partially coherent diffractive unidirectional imagers are compact (axially spanning less than 75 w), polarization-independent, and compatible with various types of illumination sources, making them well-suited for applications in asymmetric visual information processing and communication.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
CAV-AD: A Robust Framework for Detection of Anomalous Data and Malicious Sensors in CAV Networks
Authors:
Md Sazedur Rahman,
Mohamed Elmahallawy,
Sanjay Madria,
Samuel Frimpong
Abstract:
The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Alth…
▽ More
The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Although several anomaly detection (AD) approaches for CAV networks are proposed, they often fail to: i) detect multiple anomalies in specific sensor(s) with high accuracy or F1 score, and ii) identify the specific sensor being attacked. In response, this paper proposes a novel framework tailored to CAV networks, called CAV-AD, for distinguishing abnormal readings amidst multiple anomaly data while identifying malicious sensors. Specifically, CAV-AD comprises two main components: i) A novel CNN model architecture called optimized omni-scale CNN (O-OS-CNN), which optimally selects the time scale by generating all possible kernel sizes for input time series data; ii) An amplification block to increase the values of anomaly readings, enhancing sensitivity for detecting anomalies. Not only that, but CAV-AD integrates the proposed O-OS-CNN with a Kalman filter to instantly identify the malicious sensors. We extensively train CAV-AD using real-world datasets containing both instant and constant attacks, evaluating its performance in detecting intrusions from multiple anomalies, which presents a more challenging scenario. Our results demonstrate that CAV-AD outperforms state-of-the-art methods, achieving an average accuracy of 98% and an average F1 score of 89\%, while accurately identifying the malicious sensors.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Integration of Programmable Diffraction with Digital Neural Networks
Authors:
Md Sadman Sakib Rahman,
Aydogan Ozcan
Abstract:
Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural network…
▽ More
Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural networks, there have been efforts to establish diffractive processors that are jointly optimized with digital neural networks serving as their back-end. These jointly optimized hybrid (optical+digital) processors establish a new "diffractive language" between input electromagnetic waves that carry analog information and neural networks that process the digitized information at the back-end, providing the best of both worlds. Such hybrid designs can process spatially and temporally coherent, partially coherent, or incoherent input waves, providing universal coverage for any spatially varying set of point spread functions that can be optimized for a given task, executed in collaboration with digital neural networks. In this article, we highlight the utility of this exciting collaboration between engineered and programmed diffraction and digital neural networks for a diverse range of applications. We survey some of the major innovations enabled by the push-pull relationship between analog wave processing and digital neural networks, also covering the significant benefits that could be reaped through the synergy between these two complementary paradigms.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Vehicle Speed Detection System Utilizing YOLOv8: Enhancing Road Safety and Traffic Management for Metropolitan Areas
Authors:
SM Shaqib,
Alaya Parvin Alo,
Shahriar Sultan Ramit,
Afraz Ul Haque Rupak,
Sadman Sadik Khan,
Md. Sadekur Rahman
Abstract:
In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in…
▽ More
In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in 2023 that 7,902 individuals lost their lives in traffic accidents during the course of the year. Efficient vehicle speed detection is essential to maintaining traffic safety. Reliable speed detection can also help gather important traffic data, which makes it easier to optimize traffic flow and provide safer road infrastructure. The YOLOv8 model can recognize and track cars in videos with greater speed and accuracy when trained under close supervision. By providing insights into the application of supervised learning in object identification for vehicle speed estimation and concentrating on the particular traffic conditions and safety concerns in Bangladesh, this work represents a noteworthy contribution to the area. The MAE was 3.5 and RMSE was 4.22 between the predicted speed of our model and the actual speed or the ground truth measured by the speedometer Promising increased efficiency and wider applicability in a variety of traffic conditions, the suggested solution offers a financially viable substitute for conventional approaches.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Wind Power Prediction across Different Locations using Deep Domain Adaptive Learning
Authors:
Md Saiful Islam Sajol,
Md Shazid Islam,
A S M Jahid Hasan,
Md Saydur Rahman,
Jubair Yusuf
Abstract:
Accurate prediction of wind power is essential for the grid integration of this intermittent renewable source and aiding grid planners in forecasting available wind capacity. Spatial differences lead to discrepancies in climatological data distributions between two geographically dispersed regions, consequently making the prediction task more difficult. Thus, a prediction model that learns from th…
▽ More
Accurate prediction of wind power is essential for the grid integration of this intermittent renewable source and aiding grid planners in forecasting available wind capacity. Spatial differences lead to discrepancies in climatological data distributions between two geographically dispersed regions, consequently making the prediction task more difficult. Thus, a prediction model that learns from the data of a particular climatic region can suffer from being less robust. A deep neural network (DNN) based domain adaptive approach is proposed to counter this drawback. Effective weather features from a large set of weather parameters are selected using a random forest approach. A pre-trained model from the source domain is utilized to perform the prediction task, assuming no source data is available during target domain prediction. The weights of only the last few layers of the DNN model are updated throughout the task, keeping the rest of the network unchanged, making the model faster compared to the traditional approaches. The proposed approach demonstrates higher accuracy ranging from 6.14% to even 28.44% compared to the traditional non-adaptive method.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Quantum Secure Anonymous Communication Networks
Authors:
Mohammad Saidur Rahman,
Stephen DiAdamo,
Miralem Mehic,
Charles Fleming
Abstract:
Anonymous communication networks (ACNs) enable Internet browsing in a way that prevents the accessed content from being traced back to the user. This allows a high level of privacy, protecting individuals from being tracked by advertisers or governments, for example. The Tor network, a prominent example of such a network, uses a layered encryption scheme to encapsulate data packets, using Tor node…
▽ More
Anonymous communication networks (ACNs) enable Internet browsing in a way that prevents the accessed content from being traced back to the user. This allows a high level of privacy, protecting individuals from being tracked by advertisers or governments, for example. The Tor network, a prominent example of such a network, uses a layered encryption scheme to encapsulate data packets, using Tor nodes to obscure the routing process before the packets enter the public Internet. While Tor is capable of providing substantial privacy, its encryption relies on schemes, such as RSA and Diffie-Hellman for distributing symmetric keys, which are vulnerable to quantum computing attacks and are currently in the process of being phased out.
To overcome the threat, we propose a quantum-resistant alternative to RSA and Diffie-Hellman for distributing symmetric keys, namely, quantum key distribution (QKD). Standard QKD networks depend on trusted nodes to relay keys across long distances, however, reliance on trusted nodes in the quantum network does not meet the criteria necessary for establishing a Tor circuit in the ACN. We address this issue by developing a protocol and network architecture that integrates QKD without the need for trusted nodes, thus meeting the requirements of the Tor network and creating a quantum-secure anonymous communication network.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
DeepLocalization: Using change point detection for Temporal Action Localization
Authors:
Mohammed Shaiqur Rahman,
Ibne Farabi Shihab,
Lynna Chu,
Anuj Sharma
Abstract:
In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior. Utilizing the power of advanced deep learning methodologies, our objective is to tackle the critical issue of distracted driving-a significant factor contributing to road accidents. Our strategy employs a dual approach: leveragi…
▽ More
In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior. Utilizing the power of advanced deep learning methodologies, our objective is to tackle the critical issue of distracted driving-a significant factor contributing to road accidents. Our strategy employs a dual approach: leveraging Graph-Based Change-Point Detection for pinpointing actions in time alongside a Video Large Language Model (Video-LLM) for precisely categorizing activities. Through careful prompt engineering, we customize the Video-LLM to adeptly handle driving activities' nuances, ensuring its classification efficacy even with sparse data. Engineered to be lightweight, our framework is optimized for consumer-grade GPUs, making it vastly applicable in practical scenarios. We subjected our method to rigorous testing on the SynDD2 dataset, a complex benchmark for distracted driving behaviors, where it demonstrated commendable performance-achieving 57.5% accuracy in event classification and 51% in event detection. These outcomes underscore the substantial promise of DeepLocalization in accurately identifying diverse driver behaviors and their temporal occurrences, all within the bounds of limited computational resources.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
The 8th AI City Challenge
Authors:
Shuo Wang,
David C. Anastasiu,
Zheng Tang,
Ming-Ching Chang,
Yue Yao,
Liang Zheng,
Mohammed Shaiqur Rahman,
Meenakshi S. Arya,
Anuj Sharma,
Pranamesh Chakraborty,
Sanjita Prajapati,
Quan Kong,
Norimasa Kobori,
Munkhjargal Gochoo,
Munkh-Erdene Otgonbold,
Fady Alnajjar,
Ganzorig Batnasan,
Ping-Yang Chen,
Jun-Wei Hsieh,
Xunlei Wu,
Sameer Satish Pusegaonkar,
Yizhou Wang,
Sujit Biswas,
Rama Chellappa
Abstract:
The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)…
▽ More
The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Unification of Secret Key Generation and Wiretap Channel Transmission
Authors:
Yingbo Hua,
Md Saydur Rahman
Abstract:
This paper presents further insights into a recently developed round-trip communication scheme called ``Secret-message Transmission by Echoing Encrypted Probes (STEEP)''. A legitimate wireless channel between a multi-antenna user (Alice) and a single-antenna user (Bob) in the presence of a multi-antenna eavesdropper (Eve) is focused on. STEEP does not require full-duplex, channel reciprocity or Ev…
▽ More
This paper presents further insights into a recently developed round-trip communication scheme called ``Secret-message Transmission by Echoing Encrypted Probes (STEEP)''. A legitimate wireless channel between a multi-antenna user (Alice) and a single-antenna user (Bob) in the presence of a multi-antenna eavesdropper (Eve) is focused on. STEEP does not require full-duplex, channel reciprocity or Eve's channel state information, but is able to yield a positive secrecy rate in bits per channel use between Alice and Bob in every channel coherence period as long as Eve's receive channel is not noiseless. This secrecy rate does not diminish as coherence time increases. Various statistical behaviors of STEEP's secrecy capacity due to random channel fading are also illustrated.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Alto: Orchestrating Distributed Compound AI Systems with Nested Ancestry
Authors:
Deepti Raghavan,
Keshav Santhanam,
Muhammad Shahir Rahman,
Nayani Modugula,
Luis Gaspar Schroeder,
Maximilien Cura,
Houjun Liu,
Pratiksha Thaker,
Philip Levis,
Matei Zaharia
Abstract:
Compound AI applications chain together subcomponents such as generative language models, document retrievers, and embedding models. Applying traditional systems optimizations such as parallelism and pipelining in compound AI systems is difficult because each component has different constraints in terms of the granularity and type of data that it ingests. New data is often generated during interme…
▽ More
Compound AI applications chain together subcomponents such as generative language models, document retrievers, and embedding models. Applying traditional systems optimizations such as parallelism and pipelining in compound AI systems is difficult because each component has different constraints in terms of the granularity and type of data that it ingests. New data is often generated during intermediate computations, and text streams may be split into smaller, independent fragments (such as documents to sentences) which may then be re-aggregated at later parts of the computation. Due to this complexity, existing systems to serve compound AI queries do not fully take advantage of parallelism and pipelining opportunities.
We present Alto, a framework that automatically optimizes execution of compound AI queries through streaming and parallelism. Bento introduces a new abstraction called nested ancestry, a metadata hierarchy that allows the system to correctly track partial outputs and aggregate data across the heterogeneous constraints of the components of compound AI applications. This metadata is automatically inferred from the programming model, allowing developers to express complex dataflow patterns without needing to reason manually about the details of routing and aggregation. Implementations of four applications in Alto outperform or match implementations in LangGraph, a popular existing AI programming framework. Alto implementations match or improve latency by between 10-30%.
△ Less
Submitted 20 June, 2025; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Location Agnostic Adaptive Rain Precipitation Prediction using Deep Learning
Authors:
Md Shazid Islam,
Md Saydur Rahman,
Md Saad Ul Haque,
Farhana Akter Tumpa,
Md Sanzid Bin Hossain,
Abul Al Arabi
Abstract:
Rain precipitation prediction is a challenging task as it depends on weather and meteorological features which vary from location to location. As a result, a prediction model that performs well at one location does not perform well at other locations due to the distribution shifts. In addition, due to global warming, the weather patterns are changing very rapidly year by year which creates the pos…
▽ More
Rain precipitation prediction is a challenging task as it depends on weather and meteorological features which vary from location to location. As a result, a prediction model that performs well at one location does not perform well at other locations due to the distribution shifts. In addition, due to global warming, the weather patterns are changing very rapidly year by year which creates the possibility of ineffectiveness of those models even at the same location as time passes. In our work, we have proposed an adaptive deep learning-based framework in order to provide a solution to the aforementioned challenges. Our method can generalize the model for the prediction of precipitation for any location where the methods without adaptation fail. Our method has shown 43.51%, 5.09%, and 38.62% improvement after adaptation using a deep neural network for predicting the precipitation of Paris, Los Angeles, and Tokyo, respectively.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Comparative Evaluation of Weather Forecasting using Machine Learning Models
Authors:
Md Saydur Rahman,
Farhana Akter Tumpa,
Md Shazid Islam,
Abul Al Arabi,
Md Sanzid Bin Hossain,
Md Saad Ul Haque
Abstract:
Gaining a deeper understanding of weather and being able to predict its future conduct have always been considered important endeavors for the growth of our society. This research paper explores the advancements in understanding and predicting nature's behavior, particularly in the context of weather forecasting, through the application of machine learning algorithms. By leveraging the power of ma…
▽ More
Gaining a deeper understanding of weather and being able to predict its future conduct have always been considered important endeavors for the growth of our society. This research paper explores the advancements in understanding and predicting nature's behavior, particularly in the context of weather forecasting, through the application of machine learning algorithms. By leveraging the power of machine learning, data mining, and data analysis techniques, significant progress has been made in this field. This study focuses on analyzing the contributions of various machine learning algorithms in predicting precipitation and temperature patterns using a 20-year dataset from a single weather station in Dhaka city. Algorithms such as Gradient Boosting, AdaBoosting, Artificial Neural Network, Stacking Random Forest, Stacking Neural Network, and Stacking KNN are evaluated and compared based on their performance metrics, including Confusion matrix measurements. The findings highlight remarkable achievements and provide valuable insights into their performances and features correlation.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Location Agnostic Source-Free Domain Adaptive Learning to Predict Solar Power Generation
Authors:
Md Shazid Islam,
A S M Jahid Hasan,
Md Saydur Rahman,
Jubair Yusuf,
Md Saiful Islam Sajol,
Farhana Akter Tumpa
Abstract:
The prediction of solar power generation is a challenging task due to its dependence on climatic characteristics that exhibit spatial and temporal variability. The performance of a prediction model may vary across different places due to changes in data distribution, resulting in a model that works well in one region but not in others. Furthermore, as a consequence of global warming, there is a no…
▽ More
The prediction of solar power generation is a challenging task due to its dependence on climatic characteristics that exhibit spatial and temporal variability. The performance of a prediction model may vary across different places due to changes in data distribution, resulting in a model that works well in one region but not in others. Furthermore, as a consequence of global warming, there is a notable acceleration in the alteration of weather patterns on an annual basis. This phenomenon introduces the potential for diminished efficacy of existing models, even within the same geographical region, as time progresses. In this paper, a domain adaptive deep learning-based framework is proposed to estimate solar power generation using weather features that can solve the aforementioned challenges. A feed-forward deep convolutional network model is trained for a known location dataset in a supervised manner and utilized to predict the solar power of an unknown location later. This adaptive data-driven approach exhibits notable advantages in terms of computing speed, storage efficiency, and its ability to improve outcomes in scenarios where state-of-the-art non-adaptive methods fail. Our method has shown an improvement of $10.47 \%$, $7.44 \%$, $5.11\%$ in solar power prediction accuracy compared to best performing non-adaptive method for California (CA), Florida (FL) and New York (NY), respectively.
△ Less
Submitted 6 February, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
BadODD: Bangladeshi Autonomous Driving Object Detection Dataset
Authors:
Mirza Nihal Baig,
Rony Hajong,
Mahdi Murshed Patwary,
Mohammad Shahidur Rahman,
Husne Ara Chowdhury
Abstract:
We propose a comprehensive dataset for object detection in diverse driving environments across 9 districts in Bangladesh. The dataset, collected exclusively from smartphone cameras, provided a realistic representation of real-world scenarios, including day and night conditions. Most existing datasets lack suitable classes for autonomous navigation on Bangladeshi roads, making it challenging for re…
▽ More
We propose a comprehensive dataset for object detection in diverse driving environments across 9 districts in Bangladesh. The dataset, collected exclusively from smartphone cameras, provided a realistic representation of real-world scenarios, including day and night conditions. Most existing datasets lack suitable classes for autonomous navigation on Bangladeshi roads, making it challenging for researchers to develop models that can handle the intricacies of road scenarios. To address this issue, the authors proposed a new set of classes based on characteristics rather than local vehicle names. The dataset aims to encourage the development of models that can handle the unique challenges of Bangladeshi road scenarios for the effective deployment of autonomous vehicles. The dataset did not consist of any online images to simulate real-world conditions faced by autonomous vehicles. The classification of vehicles is challenging because of the diverse range of vehicles on Bangladeshi roads, including those not found elsewhere in the world. The proposed classification system is scalable and can accommodate future vehicles, making it a valuable resource for researchers in the autonomous vehicle sector.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Subwavelength Imaging using a Solid-Immersion Diffractive Optical Processor
Authors:
Jingtian Hu,
Kun Liao,
Niyazi Ulas Dinc,
Carlo Gigli,
Bijie Bai,
Tianyi Gan,
Xurong Li,
Hanlong Chen,
Xilin Yang,
Yuhang Li,
Cagatay Isil,
Md Sadman Sakib Rahman,
Jingxi Li,
Xiaoyong Hu,
Mona Jarrahi,
Demetri Psaltis,
Aydogan Ozcan
Abstract:
Phase imaging is widely used in biomedical imaging, sensing, and material characterization, among other fields. However, direct imaging of phase objects with subwavelength resolution remains a challenge. Here, we demonstrate subwavelength imaging of phase and amplitude objects based on all-optical diffractive encoding and decoding. To resolve subwavelength features of an object, the diffractive im…
▽ More
Phase imaging is widely used in biomedical imaging, sensing, and material characterization, among other fields. However, direct imaging of phase objects with subwavelength resolution remains a challenge. Here, we demonstrate subwavelength imaging of phase and amplitude objects based on all-optical diffractive encoding and decoding. To resolve subwavelength features of an object, the diffractive imager uses a thin, high-index solid-immersion layer to transmit high-frequency information of the object to a spatially-optimized diffractive encoder, which converts/encodes high-frequency information of the input into low-frequency spatial modes for transmission through air. The subsequent diffractive decoder layers (in air) are jointly designed with the encoder using deep-learning-based optimization, and communicate with the encoder layer to create magnified images of input objects at its output, revealing subwavelength features that would otherwise be washed away due to diffraction limit. We demonstrate that this all-optical collaboration between a diffractive solid-immersion encoder and the following decoder layers in air can resolve subwavelength phase and amplitude features of input objects in a highly compact design. To experimentally demonstrate its proof-of-concept, we used terahertz radiation and developed a fabrication method for creating monolithic multi-layer diffractive processors. Through these monolithically fabricated diffractive encoder-decoder pairs, we demonstrated phase-to-intensity transformations and all-optically reconstructed subwavelength phase features of input objects by directly transforming them into magnified intensity features at the output. This solid-immersion-based diffractive imager, with its compact and cost-effective design, can find wide-ranging applications in bioimaging, endoscopy, sensing and materials characterization.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Detecting Anomalies in Blockchain Transactions using Machine Learning Classifiers and Explainability Analysis
Authors:
Mohammad Hasan,
Mohammad Shahriar Rahman,
Helge Janicke,
Iqbal H. Sarker
Abstract:
As the use of Blockchain for digital payments continues to rise in popularity, it also becomes susceptible to various malicious attacks. Successfully detecting anomalies within Blockchain transactions is essential for bolstering trust in digital payments. However, the task of anomaly detection in Blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions. A…
▽ More
As the use of Blockchain for digital payments continues to rise in popularity, it also becomes susceptible to various malicious attacks. Successfully detecting anomalies within Blockchain transactions is essential for bolstering trust in digital payments. However, the task of anomaly detection in Blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions. Although several studies have been conducted in the field, a limitation persists: the lack of explanations for the model's predictions. This study seeks to overcome this limitation by integrating eXplainable Artificial Intelligence (XAI) techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions. The Shapley Additive exPlanation (SHAP) method is employed to measure the contribution of each feature, and it is compatible with ensemble models. Moreover, we present rules for interpreting whether a Bitcoin transaction is anomalous or not. Additionally, we have introduced an under-sampling algorithm named XGBCLUS, designed to balance anomalous and non-anomalous transaction data. This algorithm is compared against other commonly used under-sampling and over-sampling techniques. Finally, the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers. Our experimental results demonstrate that: (i) XGBCLUS enhances TPR and ROC-AUC scores compared to state-of-the-art under-sampling and over-sampling techniques, and (ii) our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy, TPR, and FPR scores.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition
Authors:
Md. Zarif Ul Alam,
Md Saiful Islam,
Ehsan Hoque,
M Saifur Rahman
Abstract:
Parkinson's disease (PD) is a neuro-degenerative disorder that affects movement, speech, and coordination. Timely diagnosis and treatment can improve the quality of life for PD patients. However, access to clinical diagnosis is limited in low and middle income countries (LMICs). Therefore, development of automated screening tools for PD can have a huge social impact, particularly in the public hea…
▽ More
Parkinson's disease (PD) is a neuro-degenerative disorder that affects movement, speech, and coordination. Timely diagnosis and treatment can improve the quality of life for PD patients. However, access to clinical diagnosis is limited in low and middle income countries (LMICs). Therefore, development of automated screening tools for PD can have a huge social impact, particularly in the public health sector. In this paper, we present PULSAR, a novel method to screen for PD from webcam-recorded videos of the finger-tapping task from the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS). PULSAR is trained and evaluated on data collected from 382 participants (183 self-reported as PD patients). We used an adaptive graph convolutional neural network to dynamically learn the spatio temporal graph edges specific to the finger-tapping task. We enhanced this idea with a multi stream adaptive convolution model to learn features from different modalities of data critical to detect PD, such as relative location of the finger joints, velocity and acceleration of tapping. As the labels of the videos are self-reported, there could be cases of undiagnosed PD in the non-PD labeled samples. We leveraged the idea of Positive Unlabeled (PU) Learning that does not need labeled negative data. Our experiments show clear benefit of modeling the problem in this way. PULSAR achieved 80.95% accuracy in validation set and a mean accuracy of 71.29% (2.49% standard deviation) in independent test, despite being trained with limited amount of data. This is specially promising as labeled data is scarce in health care sector. We hope PULSAR will make PD screening more accessible to everyone. The proposed techniques could be extended for assessment of other movement disorders, such as ataxia, and Huntington's disease.
△ Less
Submitted 16 February, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
An Efficient Deep Learning-based approach for Recognizing Agricultural Pests in the Wild
Authors:
Mohtasim Hadi Rafi,
Mohammad Ratul Mahjabin,
Md Sabbir Rahman
Abstract:
One of the biggest challenges that the farmers go through is to fight insect pests during agricultural product yields. The problem can be solved easily and avoid economic losses by taking timely preventive measures. This requires identifying insect pests in an easy and effective manner. Most of the insect species have similarities between them. Without proper help from the agriculturist academicia…
▽ More
One of the biggest challenges that the farmers go through is to fight insect pests during agricultural product yields. The problem can be solved easily and avoid economic losses by taking timely preventive measures. This requires identifying insect pests in an easy and effective manner. Most of the insect species have similarities between them. Without proper help from the agriculturist academician it is very challenging for the farmers to identify the crop pests accurately. To address this issue we have done extensive experiments considering different methods to find out the best method among all. This paper presents a detailed overview of the experiments done on mainly a robust dataset named IP102 including transfer learning with finetuning, attention mechanism and custom architecture. Some example from another dataset D0 is also shown to show robustness of our experimented techniques.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Leveraging Complementary Attention maps in vision transformers for OCT image analysis
Authors:
Haz Sameen Shahgir,
Tanjeem Azwad Zaman,
Khondker Salman Sayeed,
Md. Asif Haider,
Sheikh Saifur Rahman Jony,
M. Sohel Rahman
Abstract:
Optical Coherence Tomography (OCT) scan yields all possible cross-section images of a retina for detecting biomarkers linked to optical defects. Due to the high volume of data generated, an automated and reliable biomarker detection pipeline is necessary as a primary screening stage.
We outline our new state-of-the-art pipeline for identifying biomarkers from OCT scans. In collaboration with tra…
▽ More
Optical Coherence Tomography (OCT) scan yields all possible cross-section images of a retina for detecting biomarkers linked to optical defects. Due to the high volume of data generated, an automated and reliable biomarker detection pipeline is necessary as a primary screening stage.
We outline our new state-of-the-art pipeline for identifying biomarkers from OCT scans. In collaboration with trained ophthalmologists, we identify local and global structures in biomarkers. Through a comprehensive and systematic review of existing vision architectures, we evaluate different convolution and attention mechanisms for biomarker detection. We find that MaxViT, a hybrid vision transformer combining convolution layers with strided attention, is better suited for local feature detection, while EVA-02, a standard vision transformer leveraging pure attention and large-scale knowledge distillation, excels at capturing global features. We ensemble the predictions of both models to achieve first place in the IEEE Video and Image Processing Cup 2023 competition on OCT biomarker detection, achieving a patient-wise F1 score of 0.8527 in the final phase of the competition, scoring 3.8\% higher than the next best solution. Finally, we used knowledge distillation to train a single MaxViT to outperform our ensemble at a fraction of the computation cost.
△ Less
Submitted 30 May, 2025; v1 submitted 21 October, 2023;
originally announced October 2023.
-
Complex-valued universal linear transformations and image encryption using spatially incoherent diffractive networks
Authors:
Xilin Yang,
Md Sadman Sakib Rahman,
Bijie Bai,
Jingxi Li,
Aydogan Ozcan
Abstract:
As an optical processor, a Diffractive Deep Neural Network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees-of-freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent l…
▽ More
As an optical processor, a Diffractive Deep Neural Network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees-of-freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent light. Similarly, D2NNs can also perform arbitrary linear intensity transformations with spatially incoherent illumination; however, under spatially incoherent light, these transformations are non-negative, acting on diffraction-limited optical intensity patterns at the input field-of-view (FOV). Here, we expand the use of spatially incoherent D2NNs to complex-valued information processing for executing arbitrary complex-valued linear transformations using spatially incoherent light. Through simulations, we show that as the number of optimized diffractive features increases beyond a threshold dictated by the multiplication of the input and output space-bandwidth products, a spatially incoherent diffractive visual processor can approximate any complex-valued linear transformation and be used for all-optical image encryption using incoherent illumination. The findings are important for the all-optical processing of information under natural light using various forms of diffractive surface-based optical processors.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.