-
VR-FuseNet: A Fusion of Heterogeneous Fundus Data and Explainable Deep Network for Diabetic Retinopathy Classification
Authors:
Shamim Rahim Refat,
Ziyan Shirin Raha,
Shuvashis Sarker,
Faika Fairuj Preotee,
MD. Musfikur Rahman,
Tashreef Muhammad,
Mohammad Shafiul Alam
Abstract:
Diabetic retinopathy is a severe eye condition caused by diabetes where the retinal blood vessels get damaged and can lead to vision loss and blindness if not treated. Early and accurate detection is key to intervention and stopping the disease progressing. For addressing this disease properly, this paper presents a comprehensive approach for automated diabetic retinopathy detection by proposing a…
▽ More
Diabetic retinopathy is a severe eye condition caused by diabetes where the retinal blood vessels get damaged and can lead to vision loss and blindness if not treated. Early and accurate detection is key to intervention and stopping the disease progressing. For addressing this disease properly, this paper presents a comprehensive approach for automated diabetic retinopathy detection by proposing a new hybrid deep learning model called VR-FuseNet. Diabetic retinopathy is a major eye disease and leading cause of blindness especially among diabetic patients so accurate and efficient automated detection methods are required. To address the limitations of existing methods including dataset imbalance, diversity and generalization issues this paper presents a hybrid dataset created from five publicly available diabetic retinopathy datasets. Essential preprocessing techniques such as SMOTE for class balancing and CLAHE for image enhancement are applied systematically to the dataset to improve the robustness and generalizability of the dataset. The proposed VR-FuseNet model combines the strengths of two state-of-the-art convolutional neural networks, VGG19 which captures fine-grained spatial features and ResNet50V2 which is known for its deep hierarchical feature extraction. This fusion improves the diagnostic performance and achieves an accuracy of 91.824%. The model outperforms individual architectures on all performance metrics demonstrating the effectiveness of hybrid feature extraction in Diabetic Retinopathy classification tasks. To make the proposed model more clinically useful and interpretable this paper incorporates multiple XAI techniques. These techniques generate visual explanations that clearly indicate the retinal features affecting the model's prediction such as microaneurysms, hemorrhages and exudates so that clinicians can interpret and validate.
△ Less
Submitted 21 June, 2025; v1 submitted 30 April, 2025;
originally announced April 2025.
-
Bridging Dialects: Translating Standard Bangla to Regional Variants Using Neural Models
Authors:
Md. Arafat Alam Khandaker,
Ziyan Shirin Raha,
Bidyarthi Paul,
Tashreef Muhammad
Abstract:
The Bangla language includes many regional dialects, adding to its cultural richness. The translation of Bangla Language into regional dialects presents a challenge due to significant variations in vocabulary, pronunciation, and sentence structure across regions like Chittagong, Sylhet, Barishal, Noakhali, and Mymensingh. These dialects, though vital to local identities, lack of representation in…
▽ More
The Bangla language includes many regional dialects, adding to its cultural richness. The translation of Bangla Language into regional dialects presents a challenge due to significant variations in vocabulary, pronunciation, and sentence structure across regions like Chittagong, Sylhet, Barishal, Noakhali, and Mymensingh. These dialects, though vital to local identities, lack of representation in technological applications. This study addresses this gap by translating standard Bangla into these dialects using neural machine translation (NMT) models, including BanglaT5, mT5, and mBART50. The work is motivated by the need to preserve linguistic diversity and improve communication among dialect speakers. The models were fine-tuned using the "Vashantor" dataset, containing 32,500 sentences across various dialects, and evaluated through Character Error Rate (CER) and Word Error Rate (WER) metrics. BanglaT5 demonstrated superior performance with a CER of 12.3% and WER of 15.7%, highlighting its effectiveness in capturing dialectal nuances. The outcomes of this research contribute to the development of inclusive language technologies that support regional dialects and promote linguistic diversity.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Explainable AI-Enhanced Deep Learning for Pumpkin Leaf Disease Detection: A Comparative Analysis of CNN Architectures
Authors:
Md. Arafat Alam Khandaker,
Ziyan Shirin Raha,
Shifat Islam,
Tashreef Muhammad
Abstract:
Pumpkin leaf diseases are significant threats to agricultural productivity, requiring a timely and precise diagnosis for effective management. Traditional identification methods are laborious and susceptible to human error, emphasizing the necessity for automated solutions. This study employs on the "Pumpkin Leaf Disease Dataset", that comprises of 2000 high-resolution images separated into five c…
▽ More
Pumpkin leaf diseases are significant threats to agricultural productivity, requiring a timely and precise diagnosis for effective management. Traditional identification methods are laborious and susceptible to human error, emphasizing the necessity for automated solutions. This study employs on the "Pumpkin Leaf Disease Dataset", that comprises of 2000 high-resolution images separated into five categories. Downy mildew, powdery mildew, mosaic disease, bacterial leaf spot, and healthy leaves. The dataset was rigorously assembled from several agricultural fields to ensure a strong representation for model training. We explored many proficient deep learning architectures, including DenseNet201, DenseNet121, DenseNet169, Xception, ResNet50, ResNet101 and InceptionResNetV2, and observed that ResNet50 performed most effectively, with an accuracy of 90.5% and comparable precision, recall, and F1-Score. We used Explainable AI (XAI) approaches like Grad-CAM, Grad-CAM++, Score-CAM, and Layer-CAM to provide meaningful representations of model decision-making processes, which improved understanding and trust in automated disease diagnostics. These findings demonstrate ResNet50's potential to revolutionize pumpkin leaf disease detection, allowing for earlier and more accurate treatments.
△ Less
Submitted 10 April, 2025; v1 submitted 9 January, 2025;
originally announced January 2025.
-
From Images to Insights: Transforming Brain Cancer Diagnosis with Explainable AI
Authors:
Md. Arafat Alam Khandaker,
Ziyan Shirin Raha,
Salehin Bin Iqbal,
M. F. Mridha,
Jungpil Shin
Abstract:
Brain cancer represents a major challenge in medical diagnostics, requisite precise and timely detection for effective treatment. Diagnosis initially relies on the proficiency of radiologists, which can cause difficulties and threats when the expertise is sparse. Despite the use of imaging resources, brain cancer remains often difficult, time-consuming, and vulnerable to intraclass variability. Th…
▽ More
Brain cancer represents a major challenge in medical diagnostics, requisite precise and timely detection for effective treatment. Diagnosis initially relies on the proficiency of radiologists, which can cause difficulties and threats when the expertise is sparse. Despite the use of imaging resources, brain cancer remains often difficult, time-consuming, and vulnerable to intraclass variability. This study conveys the Bangladesh Brain Cancer MRI Dataset, containing 6,056 MRI images organized into three categories: Brain Tumor, Brain Glioma, and Brain Menin. The dataset was collected from several hospitals in Bangladesh, providing a diverse and realistic sample for research. We implemented advanced deep learning models, and DenseNet169 achieved exceptional results, with accuracy, precision, recall, and F1-Score all reaching 0.9983. In addition, Explainable AI (XAI) methods including GradCAM, GradCAM++, ScoreCAM, and LayerCAM were employed to provide visual representations of the decision-making processes of the models. In the context of brain cancer, these techniques highlight DenseNet169's potential to enhance diagnostic accuracy while simultaneously offering transparency, facilitating early diagnosis and better patient outcomes.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Enhancing MRI-Based Classification of Alzheimer's Disease with Explainable 3D Hybrid Compact Convolutional Transformers
Authors:
Arindam Majee,
Avisek Gupta,
Sourav Raha,
Swagatam Das
Abstract:
Alzheimer's disease (AD), characterized by progressive cognitive decline and memory loss, presents a formidable global health challenge, underscoring the critical importance of early and precise diagnosis for timely interventions and enhanced patient outcomes. While MRI scans provide valuable insights into brain structures, traditional analysis methods often struggle to discern intricate 3D patter…
▽ More
Alzheimer's disease (AD), characterized by progressive cognitive decline and memory loss, presents a formidable global health challenge, underscoring the critical importance of early and precise diagnosis for timely interventions and enhanced patient outcomes. While MRI scans provide valuable insights into brain structures, traditional analysis methods often struggle to discern intricate 3D patterns crucial for AD identification. Addressing this challenge, we introduce an alternative end-to-end deep learning model, the 3D Hybrid Compact Convolutional Transformers 3D (HCCT). By synergistically combining convolutional neural networks (CNNs) and vision transformers (ViTs), the 3D HCCT adeptly captures both local features and long-range relationships within 3D MRI scans. Extensive evaluations on prominent AD benchmark dataset, ADNI, demonstrate the 3D HCCT's superior performance, surpassing state of the art CNN and transformer-based methods in classification accuracy. Its robust generalization capability and interpretability marks a significant stride in AD classification from 3D MRI scans, promising more accurate and reliable diagnoses for improved patient care and superior clinical outcomes.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Improving Surrogate Model Robustness to Perturbations for Dynamical Systems Through Machine Learning and Data Assimilation
Authors:
Abhishek Ajayakumar,
Soumyendu Raha
Abstract:
Many real-world systems are modelled using complex ordinary differential equations (ODEs). However, the dimensionality of these systems can make them challenging to analyze. Dimensionality reduction techniques like Proper Orthogonal Decomposition (POD) can be used in such cases. However, these reduced order models are susceptible to perturbations in the input. We propose a novel framework that com…
▽ More
Many real-world systems are modelled using complex ordinary differential equations (ODEs). However, the dimensionality of these systems can make them challenging to analyze. Dimensionality reduction techniques like Proper Orthogonal Decomposition (POD) can be used in such cases. However, these reduced order models are susceptible to perturbations in the input. We propose a novel framework that combines machine learning and data assimilation techniques to improving surrogate models to handle perturbations in input data effectively. Through rigorous experiments on dynamical systems modelled on graphs, we demonstrate that our framework substantially improves the accuracy of surrogate models under input perturbations. Furthermore, we evaluate the framework's efficacy on alternative surrogate models, including neural ODEs, and the empirical results consistently show enhanced performance.
△ Less
Submitted 25 February, 2025; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Data driven approach to sparsification of reaction diffusion complex network systems
Authors:
Abhishek Ajayakumar,
Soumyendu Raha
Abstract:
Graph sparsification is an area of interest in computer science and applied mathematics. Sparsification of a graph, in general, aims to reduce the number of edges in the network while preserving specific properties of the graph, like cuts and subgraph counts. Computing the sparsest cuts of a graph is known to be NP-hard, and sparsification routines exists for generating linear sized sparsifiers in…
▽ More
Graph sparsification is an area of interest in computer science and applied mathematics. Sparsification of a graph, in general, aims to reduce the number of edges in the network while preserving specific properties of the graph, like cuts and subgraph counts. Computing the sparsest cuts of a graph is known to be NP-hard, and sparsification routines exists for generating linear sized sparsifiers in almost quadratic running time $O(n^{2 + ε})$. Consequently, obtaining a sparsifier can be a computationally demanding task and the complexity varies based on the level of sparsity required. In this study, we extend the concept of sparsification to the realm of reaction-diffusion complex systems. We aim to address the challenge of reducing the number of edges in the network while preserving the underlying flow dynamics. To tackle this problem, we adopt a relaxed approach considering only a subset of trajectories. We map the network sparsification problem to a data assimilation problem on a Reduced Order Model (ROM) space with constraints targeted at preserving the eigenmodes of the Laplacian matrix under perturbations. The Laplacian matrix ($L = D - A$) is the difference between the diagonal matrix of degrees ($D$) and the graph's adjacency matrix ($A$). We propose approximations to the eigenvalues and eigenvectors of the Laplacian matrix subject to perturbations for computational feasibility and include a custom function based on these approximations as a constraint on the data assimilation framework. We demonstrate the extension of our framework to achieve sparsity in parameter sets for Neural Ordinary Differential Equations (neural ODEs).
△ Less
Submitted 16 September, 2023; v1 submitted 19 March, 2023;
originally announced March 2023.
-
Prediction of dynamical systems using geometric constraints imposed by observations
Authors:
Saurabh Dixit,
Soumyendu Raha
Abstract:
Solution of Ordinary Differential Equation (ODE) model of dynamical system may not agree with its observed values. Often this discrepancy can be attributed to unmodeled forcings in the evolution rule of the dynamical system. In this article, an approach for data-based model improvement is described which exploits the geometric constraints imposed by the system observations to estimate these unmode…
▽ More
Solution of Ordinary Differential Equation (ODE) model of dynamical system may not agree with its observed values. Often this discrepancy can be attributed to unmodeled forcings in the evolution rule of the dynamical system. In this article, an approach for data-based model improvement is described which exploits the geometric constraints imposed by the system observations to estimate these unmodeled terms. The nominal model is augmented using these extra forcing terms to make predictions. This approach is applied to navigational satellite orbit prediction to bring down the error to approximately 12% of the error when using the nominal force model for a 2-hour prediction. In another example improved temperature predictions over the nominal heat equation are obtained for one-dimensional conduction.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization
Authors:
Farhad Merchant,
Tarun Vatwani,
Anupam Chattopadhyay,
Soumyendu Raha,
S K Nandy,
Ranjani Narayan,
Rainer Leupers
Abstract:
We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input…
▽ More
We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input matrix simultaneously. GGR takes 33% lesser multiplications compared to GR. For custom implementation of GGR, we identify macro operations in GGR and realize them on a Reconfigurable Data-path (RDP) tightly coupled to pipeline of a Processing Element (PE). In PE, GGR attains speed-up of 1.1x over Modified Householder Transform (MHT) presented in the literature. For parallel realization of GGR, we use REDEFINE, a scalable massively parallel Coarse-grained Reconfigurable Architecture, and show that the speed-up attained is commensurate with the hardware resources in REDEFINE. GGR also outperforms General Matrix Multiplication (gemm) by 10% in-terms of Gflops/watt which is counter-intuitive.
△ Less
Submitted 23 March, 2018; v1 submitted 14 March, 2018;
originally announced March 2018.
-
Achieving Efficient Realization of Kalman Filter on CGRA through Algorithm-Architecture Co-design
Authors:
Farhad Merchant,
Tarun Vatwani,
Anupam Chattopadhyay,
Soumyendu Raha,
S K Nandy,
Ranjani Narayan
Abstract:
In this paper, we present efficient realization of Kalman Filter (KF) that can achieve up to 65% of the theoretical peak performance of underlying architecture platform. KF is realized using Modified Faddeeva Algorithm (MFA) as a basic building block due to its versatility and REDEFINE Coarse Grained Reconfigurable Architecture (CGRA) is used as a platform for experiments since REDEFINE is capable…
▽ More
In this paper, we present efficient realization of Kalman Filter (KF) that can achieve up to 65% of the theoretical peak performance of underlying architecture platform. KF is realized using Modified Faddeeva Algorithm (MFA) as a basic building block due to its versatility and REDEFINE Coarse Grained Reconfigurable Architecture (CGRA) is used as a platform for experiments since REDEFINE is capable of supporting realization of a set algorithmic compute structures at run-time on a Reconfigurable Data-path (RDP). We perform several hardware and software based optimizations in the realization of KF to achieve 116% improvement in terms of Gflops over the first realization of KF. Overall, with the presented approach for KF, 4-105x performance improvement in terms of Gflops/watt over several academically and commercially available realizations of KF is attained. In REDEFINE, we show that our implementation is scalable and the performance attained is commensurate with the underlying hardware resources
△ Less
Submitted 10 February, 2018;
originally announced February 2018.
-
Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization
Authors:
Farhad Merchant,
Tarun Vatwani,
Anupam Chattopadhyay,
Soumyendu Raha,
S K Nandy,
Ranjani Narayan
Abstract:
We present efficient realization of Householder Transform (HT) based QR factorization through algorithm-architecture co-design where we achieve performance improvement of 3-90x in-terms of Gflops/watt over state-of-the-art multicore, General Purpose Graphics Processing Units (GPGPUs), Field Programmable Gate Arrays (FPGAs), and ClearSpeed CSX700. Theoretical and experimental analysis of classical…
▽ More
We present efficient realization of Householder Transform (HT) based QR factorization through algorithm-architecture co-design where we achieve performance improvement of 3-90x in-terms of Gflops/watt over state-of-the-art multicore, General Purpose Graphics Processing Units (GPGPUs), Field Programmable Gate Arrays (FPGAs), and ClearSpeed CSX700. Theoretical and experimental analysis of classical HT is performed for opportunities to exhibit higher degree of parallelism where parallelism is quantified as a number of parallel operations per level in the Directed Acyclic Graph (DAG) of the transform. Based on theoretical analysis of classical HT, an opportunity re-arrange computations in the classical HT is identified that results in Modified HT (MHT) where it is shown that MHT exhibits 1.33x times higher parallelism than classical HT. Experiments in off-the-shelf multicore and General Purpose Graphics Processing Units (GPGPUs) for HT and MHT suggest that MHT is capable of achieving slightly better or equal performance compared to classical HT based QR factorization realizations in the optimized software packages for Dense Linear Algebra (DLA). We implement MHT on a customized platform for Dense Linear Algebra (DLA) and show that MHT achieves 1.3x better performance than native implementation of classical HT on the same accelerator. For custom realization of HT and MHT based QR factorization, we also identify macro operations in the DAGs of HT and MHT that are realized on a Reconfigurable Data-path (RDP). We also observe that due to re-arrangement in the computations in MHT, custom realization of MHT is capable of achieving 12% better performance improvement over multicore and GPGPUs than the performance improvement reported by General Matrix Multiplication (GEMM) over highly tuned DLA software packages for multicore and GPGPUs which is counter-intuitive.
△ Less
Submitted 13 December, 2016;
originally announced December 2016.
-
Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design
Authors:
Farhad Merchant,
Anupam Chattopadhyay,
Soumyendu Raha,
S K Nandy,
Ranjani Narayan
Abstract:
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of…
▽ More
Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of the BLAS/LAPACK routines, sizes of the memories in the memory hierarchy of the underlying platform, bandwidth of the memory, and structure of the compute resources in the underlying platform. In this paper, we closely investigate the impact of the Floating Point Unit (FPU) micro-architecture for performance tuning of BLAS and LAPACK. We present theoretical analysis for pipeline depth of different floating point operations like multiplier, adder, square root, and divider followed by characterization of BLAS and LAPACK to determine several parameters required in the theoretical framework for deciding optimum pipeline depth of the floating operations. A simple design of a Processing Element (PE) is presented and shown that the PE outperforms the most recent custom realizations of BLAS and LAPACK by 1.1X to 1.5X in Gflops/W, and 1.9X to 2.1X in Gflops/mm^2.
△ Less
Submitted 13 November, 2017; v1 submitted 27 October, 2016;
originally announced October 2016.
-
Accelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design
Authors:
Farhad Merchant,
Tarun Vatwani,
Anupam Chattopadhyay,
Soumyendu Raha,
S K Nandy,
Ranjani Narayan
Abstract:
Basic Linear Algebra Subprograms (BLAS) play key role in high performance and scientific computing applications. Experimentally, yesteryear multicore and General Purpose Graphics Processing Units (GPGPUs) are capable of achieving up to 15 to 57% of the theoretical peak performance at 65W to 240W respectively for compute bound operations like Double/Single Precision General Matrix Multiplication (X…
▽ More
Basic Linear Algebra Subprograms (BLAS) play key role in high performance and scientific computing applications. Experimentally, yesteryear multicore and General Purpose Graphics Processing Units (GPGPUs) are capable of achieving up to 15 to 57% of the theoretical peak performance at 65W to 240W respectively for compute bound operations like Double/Single Precision General Matrix Multiplication (XGEMM). For bandwidth bound operations like Single/Double precision Matrix-vector Multiplication (XGEMV) the performance is merely 5 to 7% of the theoretical peak performance in multicores and GPGPUs respectively. Achieving performance in BLAS requires moving away from conventional wisdom and evolving towards customized accelerator tailored for BLAS through algorithm-architecture co-design. In this paper, we present acceleration of Level-1 (vector operations), Level-2 (matrix-vector operations), and Level-3 (matrix-matrix operations) BLAS through algorithm architecture co-design on a Coarse-grained Reconfigurable Architecture (CGRA). We choose REDEFINE CGRA as a platform for our experiments since REDEFINE can be adapted to support domain of interest through tailor-made Custom Function Units (CFUs). For efficient sequential realization of BLAS, we present design of a Processing Element (PE) and perform micro-architectural enhancements in the PE to achieve up-to 74% of the theoretical peak performance of PE in DGEMM, 40% in DGEMV and 20% in double precision inner product (DDOT). We attach this PE to REDEFINE CGRA as a CFU and show the scalability of our solution. Finally, we show performance improvement of 3-140x in PE over commercially available Intel micro-architectures, ClearSpeed CSX700, FPGA, and Nvidia GPGPUs.
△ Less
Submitted 27 November, 2016; v1 submitted 20 October, 2016;
originally announced October 2016.
-
Performance metrics in a hybrid MPI-OpenMP based molecular dynamics simulation with short-range interactions
Authors:
Anirban Pal,
Abhishek Agarwala,
Soumyendu Raha,
Baidurya Bhattacharya
Abstract:
We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nanoindentat…
▽ More
We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nanoindentation of Chromium films with carbon indenters using the Embedded Atom Method potential for Cr Cr interaction and the Morse potential for Cr C interactions. We study the performance of our algorithm for a range of MPIthread combinations and find the performance to depend strongly on the computational task and load sharing in the multicore processor. The algorithm scaled poorly with MPI and our hybrid schemes were observed to outperform the pure message passing scheme, despite utilizing the same number of processors or cores in the cluster. Speed-up achieved by our algorithm compared favourably with that achieved by standard MD packages.
△ Less
Submitted 25 July, 2015;
originally announced July 2015.