-
When Two LLMs Debate, Both Think They'll Win
Authors:
Pradyumna Shyama Prasad,
Minh Nhat Nguyen
Abstract:
Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language Models (LLMs) in a dynamic, adversarial debate setting, uniquely combining two realistic factors: (a) a multi-turn format requiring models to update beliefs as new information emerges, and (b) a zero-sum str…
▽ More
Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language Models (LLMs) in a dynamic, adversarial debate setting, uniquely combining two realistic factors: (a) a multi-turn format requiring models to update beliefs as new information emerges, and (b) a zero-sum structure to control for task-related uncertainty, since mutual high-confidence claims imply systematic overconfidence. We organized 60 three-round policy debates among ten state-of-the-art LLMs, with models privately rating their confidence (0-100) in winning after each round. We observed five concerning patterns: (1) Systematic overconfidence: models began debates with average initial confidence of 72.9% vs. a rational 50% baseline. (2) Confidence escalation: rather than reducing confidence as debates progressed, debaters increased their win probabilities, averaging 83% by the final round. (3) Mutual overestimation: in 61.7% of debates, both sides simultaneously claimed >=75% probability of victory, a logical impossibility. (4) Persistent self-debate bias: models debating identical copies increased confidence from 64.1% to 75.2%; even when explicitly informed their chance of winning was exactly 50%, confidence still rose (from 50.0% to 57.1%). (5) Misaligned private reasoning: models' private scratchpad thoughts sometimes differed from their public confidence ratings, raising concerns about faithfulness of chain-of-thought reasoning. These results suggest LLMs lack the ability to accurately self-assess or update their beliefs in dynamic, multi-turn tasks; a major concern as LLMs are now increasingly deployed without careful review in assistant and agentic roles.
Code for our experiments is available at https://github.com/pradyuprasad/llms_overconfidence
△ Less
Submitted 9 June, 2025; v1 submitted 25 May, 2025;
originally announced May 2025.
-
Know Or Not: a library for evaluating out-of-knowledge base robustness
Authors:
Jessica Foo,
Pradyumna Shyama Prasad,
Shaun Khoo
Abstract:
While the capabilities of large language models (LLMs) have progressed significantly, their use in high-stakes applications have been limited due to risks of hallucination. One key approach in reducing hallucination is retrieval-augmented generation (RAG), but even in such setups, LLMs may still hallucinate when presented with questions outside of the knowledge base. Such behavior is unacceptable…
▽ More
While the capabilities of large language models (LLMs) have progressed significantly, their use in high-stakes applications have been limited due to risks of hallucination. One key approach in reducing hallucination is retrieval-augmented generation (RAG), but even in such setups, LLMs may still hallucinate when presented with questions outside of the knowledge base. Such behavior is unacceptable in high-stake applications where LLMs are expected to abstain from answering queries it does not have sufficient context on. In this work, we present a novel methodology for systematically evaluating out-of-knowledge base (OOKB) robustness of LLMs (whether LLMs know or do not know) in the RAG setting, without the need for manual annotation of gold standard answers. We implement our methodology in knowornot, an open-source library that enables users to develop their own customized evaluation data and pipelines for OOKB robustness. knowornot comprises four main features. Firstly, it provides a unified, high-level API that streamlines the process of setting up and running robustness benchmarks. Secondly, its modular architecture emphasizes extensibility and flexibility, allowing users to easily integrate their own LLM clients and RAG settings. Thirdly, its rigorous data modeling design ensures experiment reproducibility, reliability and traceability. Lastly, it implements a comprehensive suite of tools for users to customize their pipelines. We demonstrate the utility of knowornot by developing a challenging benchmark, PolicyBench, which spans four Question-Answer (QA) chatbots on government policies, and analyze its OOKB robustness. The source code of knowornot is available https://github.com/govtech-responsibleai/KnowOrNot.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Scaling Probabilistic Circuits via Data Partitioning
Authors:
Jonas Seng,
Florian Peter Busch,
Pooja Prasad,
Devendra Singh Dhami,
Martin Mundt,
Kristian Kersting
Abstract:
Probabilistic circuits (PCs) enable us to learn joint distributions over a set of random variables and to perform various probabilistic queries in a tractable fashion. Though the tractability property allows PCs to scale beyond non-tractable models such as Bayesian Networks, scaling training and inference of PCs to larger, real-world datasets remains challenging. To remedy the situation, we show h…
▽ More
Probabilistic circuits (PCs) enable us to learn joint distributions over a set of random variables and to perform various probabilistic queries in a tractable fashion. Though the tractability property allows PCs to scale beyond non-tractable models such as Bayesian Networks, scaling training and inference of PCs to larger, real-world datasets remains challenging. To remedy the situation, we show how PCs can be learned across multiple machines by recursively partitioning a distributed dataset, thereby unveiling a deep connection between PCs and federated learning (FL). This leads to federated circuits (FCs) -- a novel and flexible federated learning (FL) framework that (1) allows one to scale PCs on distributed learning environments (2) train PCs faster and (3) unifies for the first time horizontal, vertical, and hybrid FL in one framework by re-framing FL as a density estimation problem over distributed datasets. We demonstrate FC's capability to scale PCs on various large-scale datasets. Also, we show FC's versatility in handling horizontal, vertical, and hybrid FL within a unified framework on multiple classification tasks.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields
Authors:
Jing Yang,
Pratusha Bhuvana Prasad,
Qing Zhang,
Yajie Zhao
Abstract:
Accurately measuring the geometry and spatially-varying reflectance of real-world objects is a complex task due to their intricate shapes formed by concave features, hollow engravings and diverse surfaces, resulting in inter-reflection and occlusion when photographed. Moreover, issues like lens flare and overexposure can arise from interference from secondary reflections and limitations of hardwar…
▽ More
Accurately measuring the geometry and spatially-varying reflectance of real-world objects is a complex task due to their intricate shapes formed by concave features, hollow engravings and diverse surfaces, resulting in inter-reflection and occlusion when photographed. Moreover, issues like lens flare and overexposure can arise from interference from secondary reflections and limitations of hardware even in professional studios. In this paper, we propose a novel approach using polarized reflectance field capture and a comprehensive statistical analysis algorithm to obtain highly accurate surface normals (within 0.1mm/px) and spatially-varying reflectance data, including albedo, specular separation, roughness, and anisotropy parameters for realistic rendering and analysis. Our algorithm removes image artifacts via analytical modeling and further employs both an initial step and an optimization step computed on the whole image collection to further enhance the precision of per-pixel surface reflectance and normal measurement. We showcase the captured shapes and reflectance of diverse objects with a wide material range, spanning from highly diffuse to highly glossy - a challenge unaddressed by prior techniques. Our approach enhances downstream applications by offering precise measurements for realistic rendering and provides a valuable training dataset for emerging research in inverse rendering. We will release the polarized reflectance fields of several captured objects with this work.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
Exploring Large Language Models for Specialist-level Oncology Care
Authors:
Anil Palepu,
Vikram Dhillon,
Polly Niravath,
Wei-Hung Weng,
Preethi Prasad,
Khaled Saab,
Ryutaro Tanno,
Yong Cheng,
Hanh Mai,
Ethan Burns,
Zainub Ajmal,
Kavita Kulkarni,
Philip Mansfield,
Dale Webster,
Joelle Barral,
Juraj Gottweis,
Mike Schaekermann,
S. Sara Mahdavi,
Vivek Natarajan,
Alan Karthikesalingam,
Tao Tu
Abstract:
Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast…
▽ More
Large language models (LLMs) have shown remarkable progress in encoding clinical knowledge and responding to complex medical queries with appropriate clinical reasoning. However, their applicability in subspecialist or complex medical settings remains underexplored. In this work, we probe the performance of AMIE, a research conversational diagnostic AI system, in the subspecialist domain of breast oncology care without specific fine-tuning to this challenging domain. To perform this evaluation, we curated a set of 50 synthetic breast cancer vignettes representing a range of treatment-naive and treatment-refractory cases and mirroring the key information available to a multidisciplinary tumor board for decision-making (openly released with this work). We developed a detailed clinical rubric for evaluating management plans, including axes such as the quality of case summarization, safety of the proposed care plan, and recommendations for chemotherapy, radiotherapy, surgery and hormonal therapy. To improve performance, we enhanced AMIE with the inference-time ability to perform web search retrieval to gather relevant and up-to-date clinical knowledge and refine its responses with a multi-stage self-critique pipeline. We compare response quality of AMIE with internal medicine trainees, oncology fellows, and general oncology attendings under both automated and specialist clinician evaluations. In our evaluations, AMIE outperformed trainees and fellows demonstrating the potential of the system in this challenging and important domain. We further demonstrate through qualitative examples, how systems such as AMIE might facilitate conversational interactions to assist clinicians in their decision making. However, AMIE's performance was overall inferior to attending oncologists suggesting that further research is needed prior to consideration of prospective uses.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Localized Gaussian Splatting Editing with Contextual Awareness
Authors:
Hanyuan Xiao,
Yingshu Chen,
Huajian Huang,
Haolin Xiong,
Jing Yang,
Pratusha Prasad,
Yajie Zhao
Abstract:
Recent text-guided generation of individual 3D object has achieved great success using diffusion priors. However, these methods are not suitable for object insertion and replacement tasks as they do not consider the background, leading to illumination mismatches within the environment. To bridge the gap, we introduce an illumination-aware 3D scene editing pipeline for 3D Gaussian Splatting (3DGS)…
▽ More
Recent text-guided generation of individual 3D object has achieved great success using diffusion priors. However, these methods are not suitable for object insertion and replacement tasks as they do not consider the background, leading to illumination mismatches within the environment. To bridge the gap, we introduce an illumination-aware 3D scene editing pipeline for 3D Gaussian Splatting (3DGS) representation. Our key observation is that inpainting by the state-of-the-art conditional 2D diffusion model is consistent with background in lighting. To leverage the prior knowledge from the well-trained diffusion models for 3D object generation, our approach employs a coarse-to-fine objection optimization pipeline with inpainted views. In the first coarse step, we achieve image-to-3D lifting given an ideal inpainted view. The process employs 3D-aware diffusion prior from a view-conditioned diffusion model, which preserves illumination present in the conditioning image. To acquire an ideal inpainted image, we introduce an Anchor View Proposal (AVP) algorithm to find a single view that best represents the scene illumination in target region. In the second Texture Enhancement step, we introduce a novel Depth-guided Inpainting Score Distillation Sampling (DI-SDS), which enhances geometry and texture details with the inpainting diffusion prior, beyond the scope of the 3D-aware diffusion prior knowledge in the first coarse step. DI-SDS not only provides fine-grained texture enhancement, but also urges optimization to respect scene lighting. Our approach efficiently achieves local editing with global illumination consistency without explicitly modeling light transport. We demonstrate robustness of our method by evaluating editing in real scenes containing explicit highlight and shadows, and compare against the state-of-the-art text-to-3D editing methods.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Leveraging Synthetic Data for Generalizable and Fair Facial Action Unit Detection
Authors:
Liupei Lu,
Yufeng Yin,
Yuming Gu,
Yizhen Wu,
Pratusha Prasad,
Yajie Zhao,
Mohammad Soleymani
Abstract:
Facial action unit (AU) detection is a fundamental block for objective facial expression analysis. Supervised learning approaches require a large amount of manual labeling which is costly. The limited labeled data are also not diverse in terms of gender which can affect model fairness. In this paper, we propose to use synthetically generated data and multi-source domain adaptation (MSDA) to addres…
▽ More
Facial action unit (AU) detection is a fundamental block for objective facial expression analysis. Supervised learning approaches require a large amount of manual labeling which is costly. The limited labeled data are also not diverse in terms of gender which can affect model fairness. In this paper, we propose to use synthetically generated data and multi-source domain adaptation (MSDA) to address the problems of the scarcity of labeled data and the diversity of subjects. Specifically, we propose to generate a diverse dataset through synthetic facial expression re-targeting by transferring the expressions from real faces to synthetic avatars. Then, we use MSDA to transfer the AU detection knowledge from a real dataset and the synthetic dataset to a target dataset. Instead of aligning the overall distributions of different domains, we propose Paired Moment Matching (PM2) to align the features of the paired real and synthetic data with the same facial expression. To further improve gender fairness, PM2 matches the features of the real data with a female and a male synthetic image. Our results indicate that synthetic data and the proposed model improve both AU detection performance and fairness across genders, demonstrating its potential to solve AU detection in-the-wild.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
FedPNN: One-shot Federated Classification via Evolving Clustering Method and Probabilistic Neural Network hybrid
Authors:
Polaki Durga Prasad,
Yelleti Vivek,
Vadlamani Ravi
Abstract:
Protecting data privacy is paramount in the fields such as finance, banking, and healthcare. Federated Learning (FL) has attracted widespread attention due to its decentralized, distributed training and the ability to protect the privacy while obtaining a global shared model. However, FL presents challenges such as communication overhead, and limited resource capability. This motivated us to propo…
▽ More
Protecting data privacy is paramount in the fields such as finance, banking, and healthcare. Federated Learning (FL) has attracted widespread attention due to its decentralized, distributed training and the ability to protect the privacy while obtaining a global shared model. However, FL presents challenges such as communication overhead, and limited resource capability. This motivated us to propose a two-stage federated learning approach toward the objective of privacy protection, which is a first-of-its-kind study as follows: (i) During the first stage, the synthetic dataset is generated by employing two different distributions as noise to the vanilla conditional tabular generative adversarial neural network (CTGAN) resulting in modified CTGAN, and (ii) In the second stage, the Federated Probabilistic Neural Network (FedPNN) is developed and employed for building globally shared classification model. We also employed synthetic dataset metrics to check the quality of the generated synthetic dataset. Further, we proposed a meta-clustering algorithm whereby the cluster centers obtained from the clients are clustered at the server for training the global model. Despite PNN being a one-pass learning classifier, its complexity depends on the training data size. Therefore, we employed a modified evolving clustering method (ECM), another one-pass algorithm to cluster the training data thereby increasing the speed further. Moreover, we conducted sensitivity analysis by varying Dthr, a hyperparameter of ECM at the server and client, one at a time. The effectiveness of our approach is validated on four finance and medical datasets.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
A Comprehensive Study on Machine Learning Methods to Increase the Prediction Accuracy of Classifiers and Reduce the Number of Medical Tests Required to Diagnose Alzheimer'S Disease
Authors:
Md. Sharifur Rahman,
Professor Girijesh Prasad
Abstract:
Alzheimer's patients gradually lose their ability to think, behave, and interact with others. Medical history, laboratory tests, daily activities, and personality changes can all be used to diagnose the disorder. A series of time-consuming and expensive tests are used to diagnose the illness. The most effective way to identify Alzheimer's disease is using a Random-forest classifier in this study,…
▽ More
Alzheimer's patients gradually lose their ability to think, behave, and interact with others. Medical history, laboratory tests, daily activities, and personality changes can all be used to diagnose the disorder. A series of time-consuming and expensive tests are used to diagnose the illness. The most effective way to identify Alzheimer's disease is using a Random-forest classifier in this study, along with various other Machine Learning techniques. The main goal of this study is to fine-tune the classifier to detect illness with fewer tests while maintaining a reasonable disease discovery accuracy. We successfully identified the condition in almost 94% of cases using four of the thirty frequently utilized indicators.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Scalable mRMR feature selection to handle high dimensional datasets: Vertical partitioning based Iterative MapReduce framework
Authors:
Yelleti Vivek,
P. S. V. S. Sai Prasad
Abstract:
While building machine learning models, Feature selection (FS) stands out as an essential preprocessing step used to handle the uncertainty and vagueness in the data. Recently, the minimum Redundancy and Maximum Relevance (mRMR) approach has proven to be effective in obtaining the irredundant feature subset. Owing to the generation of voluminous datasets, it is essential to design scalable solutio…
▽ More
While building machine learning models, Feature selection (FS) stands out as an essential preprocessing step used to handle the uncertainty and vagueness in the data. Recently, the minimum Redundancy and Maximum Relevance (mRMR) approach has proven to be effective in obtaining the irredundant feature subset. Owing to the generation of voluminous datasets, it is essential to design scalable solutions using distributed/parallel paradigms. MapReduce solutions are proven to be one of the best approaches to designing fault-tolerant and scalable solutions. This work analyses the existing MapReduce approaches for mRMR feature selection and identifies the limitations thereof. In the current study, we proposed VMR_mRMR, an efficient vertical partitioning-based approach using a memorization approach, thereby overcoming the extant approaches limitations. The experiment analysis says that VMR_mRMR significantly outperformed extant approaches and achieved a better computational gain (C.G). In addition, we also conducted a comparative analysis with the horizontal partitioning approach HMR_mRMR [1] to assess the strengths and limitations of the proposed approach.
△ Less
Submitted 24 July, 2024; v1 submitted 21 August, 2022;
originally announced August 2022.
-
Deep Learning for Size and Microscope Feature Extraction and Classification in Oral Cancer: Enhanced Convolution Neural Network
Authors:
Prakrit Joshi,
Omar Hisham Alsadoon,
Abeer Alsadoon,
Nada AlSallami,
Tarik A. Rashid,
P. W. C. Prasad,
Sami Haddad
Abstract:
Background and Aim: Over-fitting issue has been the reason behind deep learning technology not being successfully implemented in oral cancer images classification. The aims of this research were reducing overfitting for accurately producing the required dimension reduction feature map through Deep Learning algorithm using Convolutional Neural Network. Methodology: The proposed system consists of E…
▽ More
Background and Aim: Over-fitting issue has been the reason behind deep learning technology not being successfully implemented in oral cancer images classification. The aims of this research were reducing overfitting for accurately producing the required dimension reduction feature map through Deep Learning algorithm using Convolutional Neural Network. Methodology: The proposed system consists of Enhanced Convolutional Neural Network that uses an autoencoder technique to increase the efficiency of the feature extraction process and compresses information. In this technique, unpooling and deconvolution is done to generate the input data to minimize the difference between input and output data. Moreover, it extracts characteristic features from the input data set to regenerate input data from those features by learning a network to reduce overfitting. Results: Different accuracy and processing time value is achieved while using different sample image group of Confocal Laser Endomicroscopy (CLE) images. The results showed that the proposed solution is better than the current system. Moreover, the proposed system has improved the classification accuracy by 5~ 5.5% on average and reduced the average processing time by 20 ~ 30 milliseconds. Conclusion: The proposed system focuses on the accurate classification of oral cancer cells of different anatomical locations from the CLE images. Finally, this study enhances the accuracy and processing time using the autoencoder method that solves the overfitting problem.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
A novel solution of deep learning for enhanced support vector machine for predicting the onset of type 2 diabetes
Authors:
Marmik Shrestha,
Omar Hisham Alsadoon,
Abeer Alsadoon,
Thair Al-Dala'in,
Tarik A. Rashid,
P. W. C. Prasad,
Ahmad Alrubaie
Abstract:
Type 2 Diabetes is one of the most major and fatal diseases known to human beings, where thousands of people are subjected to the onset of Type 2 Diabetes every year. However, the diagnosis and prevention of Type 2 Diabetes are relatively costly in today's scenario; hence, the use of machine learning and deep learning techniques is gaining momentum for predicting the onset of Type 2 Diabetes. This…
▽ More
Type 2 Diabetes is one of the most major and fatal diseases known to human beings, where thousands of people are subjected to the onset of Type 2 Diabetes every year. However, the diagnosis and prevention of Type 2 Diabetes are relatively costly in today's scenario; hence, the use of machine learning and deep learning techniques is gaining momentum for predicting the onset of Type 2 Diabetes. This research aims to increase the accuracy and Area Under the Curve (AUC) metric while improving the processing time for predicting the onset of Type 2 Diabetes. The proposed system consists of a deep learning technique that uses the Support Vector Machine (SVM) algorithm along with the Radial Base Function (RBF) along with the Long Short-term Memory Layer (LSTM) for prediction of onset of Type 2 Diabetes. The proposed solution provides an average accuracy of 86.31 % and an average AUC value of 0.8270 or 82.70 %, with an improvement of 3.8 milliseconds in the processing. Radial Base Function (RBF) kernel and the LSTM layer enhance the prediction accuracy and AUC metric from the current industry standard, making it more feasible for practical use without compromising the processing time.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
Deep Learning Neural Network for Lung Cancer Classification: Enhanced Optimization Function
Authors:
Bhoj Raj Pandit,
Abeer Alsadoon,
P. W. C. Prasad,
Sarmad Al Aloussi,
Tarik A. Rashid,
Omar Hisham Alsadoon,
Oday D. Jerew
Abstract:
Background and Purpose: Convolutional neural network is widely used for image recognition in the medical area at nowadays. However, overall accuracy in predicting lung tumor is low and the processing time is high as the error occurred while reconstructing the CT image. The aim of this work is to increase the overall prediction accuracy along with reducing processing time by using multispace image…
▽ More
Background and Purpose: Convolutional neural network is widely used for image recognition in the medical area at nowadays. However, overall accuracy in predicting lung tumor is low and the processing time is high as the error occurred while reconstructing the CT image. The aim of this work is to increase the overall prediction accuracy along with reducing processing time by using multispace image in pooling layer of convolution neural network. Methodology: The proposed method has the autoencoder system to improve the overall accuracy, and to predict lung cancer by using multispace image in pooling layer of convolution neural network and Adam Algorithm for optimization. First, the CT images were pre-processed by feeding image to the convolution filter and down sampled by using max pooling. Then, features are extracted using the autoencoder model based on convolutional neural network and multispace image reconstruction technique is used to reduce error while reconstructing the image which then results improved accuracy to predict lung nodule. Finally, the reconstructed images are taken as input for SoftMax classifier to classify the CT images. Results: The state-of-art and proposed solutions were processed in Python Tensor Flow and It provides significant increase in accuracy in classification of lung cancer to 99.5 from 98.9 and decrease in processing time from 10 frames/second to 12 seconds/second. Conclusion: The proposed solution provides high classification accuracy along with less processing time compared to the state of art. For future research, large dataset can be implemented, and low pixel image can be processed to evaluate the classification
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
A Novel Enhanced Convolution Neural Network with Extreme Learning Machine: Facial Emotional Recognition in Psychology Practices
Authors:
Nitesh Banskota,
Abeer Alsadoon,
P. W. C. Prasad,
Ahmed Dawoud,
Tarik A. Rashid,
Omar Hisham Alsadoon
Abstract:
Facial emotional recognition is one of the essential tools used by recognition psychology to diagnose patients. Face and facial emotional recognition are areas where machine learning is excelling. Facial Emotion Recognition in an unconstrained environment is an open challenge for digital image processing due to different environments, such as lighting conditions, pose variation, yaw motion, and oc…
▽ More
Facial emotional recognition is one of the essential tools used by recognition psychology to diagnose patients. Face and facial emotional recognition are areas where machine learning is excelling. Facial Emotion Recognition in an unconstrained environment is an open challenge for digital image processing due to different environments, such as lighting conditions, pose variation, yaw motion, and occlusions. Deep learning approaches have shown significant improvements in image recognition. However, accuracy and time still need improvements. This research aims to improve facial emotion recognition accuracy during the training session and reduce processing time using a modified Convolution Neural Network Enhanced with Extreme Learning Machine (CNNEELM). The system entails (CNNEELM) improving the accuracy in image registration during the training session. Furthermore, the system recognizes six facial emotions happy, sad, disgust, fear, surprise, and neutral with the proposed CNNEELM model. The study shows that the overall facial emotion recognition accuracy is improved by 2% than the state of art solutions with a modified Stochastic Gradient Descent (SGD) technique. With the Extreme Learning Machine (ELM) classifier, the processing time is brought down to 65ms from 113ms, which can smoothly classify each frame from a video clip at 20fps. With the pre-trained InceptionV3 model, the proposed CNNEELM model is trained with JAFFE, CK+, and FER2013 expression datasets. The simulation results show significant improvements in accuracy and processing time, making the model suitable for the video analysis process. Besides, the study solves the issue of the large processing time required to process the facial images.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
FEATHERS: Federated Architecture and Hyperparameter Search
Authors:
Jonas Seng,
Pooja Prasad,
Martin Mundt,
Devendra Singh Dhami,
Kristian Kersting
Abstract:
Deep neural architectures have profound impact on achieved performance in many of today's AI tasks, yet, their design still heavily relies on human prior knowledge and experience. Neural architecture search (NAS) together with hyperparameter optimization (HO) helps to reduce this dependence. However, state of the art NAS and HO rapidly become infeasible with increasing amount of data being stored…
▽ More
Deep neural architectures have profound impact on achieved performance in many of today's AI tasks, yet, their design still heavily relies on human prior knowledge and experience. Neural architecture search (NAS) together with hyperparameter optimization (HO) helps to reduce this dependence. However, state of the art NAS and HO rapidly become infeasible with increasing amount of data being stored in a distributed fashion, typically violating data privacy regulations such as GDPR and CCPA. As a remedy, we introduce FEATHERS - $\textbf{FE}$derated $\textbf{A}$rchi$\textbf{T}$ecture and $\textbf{H}$yp$\textbf{ER}$parameter $\textbf{S}$earch, a method that not only optimizes both neural architectures and optimization-related hyperparameters jointly in distributed data settings, but further adheres to data privacy through the use of differential privacy (DP). We show that FEATHERS efficiently optimizes architectural and optimization-related hyperparameters alike, while demonstrating convergence on classification tasks at no detriment to model performance when complying with privacy constraints.
△ Less
Submitted 27 March, 2023; v1 submitted 24 June, 2022;
originally announced June 2022.
-
Deep Learning for Sleep Stages Classification: Modified Rectified Linear Unit Activation Function and Modified Orthogonal Weight Initialisation
Authors:
Akriti Bhusal,
Abeer Alsadoon,
P. W. C. Prasad,
Nada Alsalami,
Tarik A. Rashid
Abstract:
Background and Aim: Each stage of sleep can affect human health, and not getting enough sleep at any stage may lead to sleep disorder like parasomnia, apnea, insomnia, etc. Sleep-related diseases could be diagnosed using Convolutional Neural Network Classifier. However, this classifier has not been successfully implemented into sleep stage classification systems due to high complexity and low accu…
▽ More
Background and Aim: Each stage of sleep can affect human health, and not getting enough sleep at any stage may lead to sleep disorder like parasomnia, apnea, insomnia, etc. Sleep-related diseases could be diagnosed using Convolutional Neural Network Classifier. However, this classifier has not been successfully implemented into sleep stage classification systems due to high complexity and low accuracy of classification. The aim of this research is to increase the accuracy and reduce the learning time of Convolutional Neural Network Classifier. Methodology: The proposed system used a modified Orthogonal Convolutional Neural Network and a modified Adam optimisation technique to improve the sleep stage classification accuracy and reduce the gradient saturation problem that occurs due to sigmoid activation function. The proposed system uses Leaky Rectified Linear Unit (ReLU) instead of sigmoid activation function as an activation function. Results: The proposed system called Enhanced Sleep Stage Classification system (ESSC) used six different databases for training and testing the proposed model on the different sleep stages. These databases are University College Dublin database (UCD), Beth Israel Deaconess Medical Center MIT database (MIT-BIH), Sleep European Data Format (EDF), Sleep EDF Extended, Montreal Archive of Sleep Studies (MASS), and Sleep Heart Health Study (SHHS). Our results show that the gradient saturation problem does not exist anymore. The modified Adam optimiser helps to reduce the noise which in turn result in faster convergence time. Conclusion: The convergence speed of ESSC is increased along with better classification accuracy compared to the state of art solution.
△ Less
Submitted 18 February, 2022;
originally announced March 2022.
-
Deep Learning Neural Networks for Emotion Classification from Text: Enhanced Leaky Rectified Linear Unit Activation and Weighted Loss
Authors:
Hui Yang,
Abeer Alsadoon,
P. W. C. Prasad,
Thair Al-Dala'in,
Tarik A. Rashid,
Angelika Maag,
Omar Hisham Alsadoon
Abstract:
Accurate emotion classification for online reviews is vital for business organizations to gain deeper insights into markets. Although deep learning has been successfully implemented in this area, accuracy and processing time are still major problems preventing it from reaching its full potential. This paper proposes an Enhanced Leaky Rectified Linear Unit activation and Weighted Loss (ELReLUWL) al…
▽ More
Accurate emotion classification for online reviews is vital for business organizations to gain deeper insights into markets. Although deep learning has been successfully implemented in this area, accuracy and processing time are still major problems preventing it from reaching its full potential. This paper proposes an Enhanced Leaky Rectified Linear Unit activation and Weighted Loss (ELReLUWL) algorithm for enhanced text emotion classification and faster parameter convergence speed. This algorithm includes the definition of the inflection point and the slope for inputs on the left side of the inflection point to avoid gradient saturation. It also considers the weight of samples belonging to each class to compensate for the influence of data imbalance. Convolutional Neural Network (CNN) combined with the proposed algorithm to increase the classification accuracy and decrease the processing time by eliminating the gradient saturation problem and minimizing the negative effect of data imbalance, demonstrated on a binary sentiment problem. The results show that the proposed solution achieves better classification performance in different data scenarios and different review types. The proposed model takes less convergence time to achieve model optimization with seven epochs against the current convergence time of 11.5 epochs on average. The proposed solution improves accuracy and reduces the processing time of text emotion classification. The solution provides an average class accuracy of 96.63% against a current average accuracy of 91.56%. It also provides a processing time of 23.3 milliseconds compared to the current average processing time of 33.2 milliseconds. Finally, this study solves the issues of gradient saturation and data imbalance. It enhances overall average class accuracy and decreases processing time.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
Generative Adversarial Network (GAN) and Enhanced Root Mean Square Error (ERMSE): Deep Learning for Stock Price Movement Prediction
Authors:
Ashish Kumar,
Abeer Alsadoon,
P. W. C. Prasad,
Salma Abdullah,
Tarik A. Rashid,
Duong Thu Hang Pham,
Tran Quoc Vinh Nguyen
Abstract:
The prediction of stock price movement direction is significant in financial circles and academic. Stock price contains complex, incomplete, and fuzzy information which makes it an extremely difficult task to predict its development trend. Predicting and analysing financial data is a nonlinear, time-dependent problem. With rapid development in machine learning and deep learning, this task can be p…
▽ More
The prediction of stock price movement direction is significant in financial circles and academic. Stock price contains complex, incomplete, and fuzzy information which makes it an extremely difficult task to predict its development trend. Predicting and analysing financial data is a nonlinear, time-dependent problem. With rapid development in machine learning and deep learning, this task can be performed more effectively by a purposely designed network. This paper aims to improve prediction accuracy and minimizing forecasting error loss through deep learning architecture by using Generative Adversarial Networks. It was proposed a generic model consisting of Phase-space Reconstruction (PSR) method for reconstructing price series and Generative Adversarial Network (GAN) which is a combination of two neural networks which are Long Short-Term Memory (LSTM) as Generative model and Convolutional Neural Network (CNN) as Discriminative model for adversarial training to forecast the stock market. LSTM will generate new instances based on historical basic indicators information and then CNN will estimate whether the data is predicted by LSTM or is real. It was found that the Generative Adversarial Network (GAN) has performed well on the enhanced root mean square error to LSTM, as it was 4.35% more accurate in predicting the direction and reduced processing time and RMSE by 78 secs and 0.029, respectively. This study provides a better result in the accuracy of the stock index. It seems that the proposed system concentrates on minimizing the root mean square error and processing time and improving the direction prediction accuracy, and provides a better result in the accuracy of the stock index.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Optimal Technical Indicator-based Trading Strategies Using NSGA-II
Authors:
P. Shanmukh Kali Prasad,
Vadlamani Madhav,
Ramanuj Lal,
Vadlamani Ravi
Abstract:
This paper proposes non-dominated sorting genetic algorithm-II (NSGA-II ) in the context of technical indicator-based stock trading, by finding optimal combinations of technical indicators to generate buy and sell strategies such that the objectives, namely, Sharpe ratio and Maximum Drawdown are maximized and minimized respectively. NSGA-II is chosen because it is a very popular and powerful bi-ob…
▽ More
This paper proposes non-dominated sorting genetic algorithm-II (NSGA-II ) in the context of technical indicator-based stock trading, by finding optimal combinations of technical indicators to generate buy and sell strategies such that the objectives, namely, Sharpe ratio and Maximum Drawdown are maximized and minimized respectively. NSGA-II is chosen because it is a very popular and powerful bi-objective evolutionary algorithm. The training and testing used a rolling-based approach (two years training and a year for testing) and thus the results of the approach seem to be considerably better in stable periods without major economic fluctuations. Further, another important contribution of this study is to incorporate the transaction cost and domain expertise in the whole modeling approach.
△ Less
Submitted 25 January, 2022; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Mixed Reality using Illumination-aware Gradient Mixing in Surgical Telepresence: Enhanced Multi-layer Visualization
Authors:
Nirakar Puri,
Abeer Alsadoon,
P. W. C. Prasad,
Nada Alsalami,
Tarik A. Rashid
Abstract:
Background and aim: Surgical telepresence using augmented perception has been applied, but mixed reality is still being researched and is only theoretical. The aim of this work is to propose a solution to improve the visualization in the final merged video by producing globally consistent videos when the intensity of illumination in the input source and target video varies. Methodology: The propos…
▽ More
Background and aim: Surgical telepresence using augmented perception has been applied, but mixed reality is still being researched and is only theoretical. The aim of this work is to propose a solution to improve the visualization in the final merged video by producing globally consistent videos when the intensity of illumination in the input source and target video varies. Methodology: The proposed system uses an enhanced multi-layer visualization with illumination-aware gradient mixing using Illumination Aware Video Composition algorithm. Particle Swarm Optimization Algorithm is used to find the best sample pair from foreground and background region and image pixel correlation to estimate the alpha matte. Particle Swarm Optimization algorithm helps to get the original colour and depth of the unknown pixel in the unknown region. Result: Our results showed improved accuracy caused by reducing the Mean squared Error for selecting the best sample pair for unknown region in 10 each sample for bowel, jaw and breast. The amount of this reduction is 16.48% from the state of art system. As a result, the visibility accuracy is improved from 89.4 to 97.7% which helped to clear the hand vision even in the difference of light. Conclusion: Illumination effect and alpha pixel correlation improves the visualization accuracy and produces a globally consistent composition results and maintains the temporal coherency when compositing two videos with high and inverse illumination effect. In addition, this paper provides a solution for selecting the best sampling pair for the unknown region to obtain the original colour and depth.
△ Less
Submitted 21 August, 2021;
originally announced October 2021.
-
Ground material classification for UAV-based photogrammetric 3D data A 2D-3D Hybrid Approach
Authors:
Meida Chen,
Andrew Feng,
Yu Hou,
Kyle McCullough,
Pratusha Bhuvana Prasad,
Lucio Soibelman
Abstract:
In recent years, photogrammetry has been widely used in many areas to create photorealistic 3D virtual data representing the physical environment. The innovation of small unmanned aerial vehicles (sUAVs) has provided additional high-resolution imaging capabilities with low cost for mapping a relatively large area of interest. These cutting-edge technologies have caught the US Army and Navy's atten…
▽ More
In recent years, photogrammetry has been widely used in many areas to create photorealistic 3D virtual data representing the physical environment. The innovation of small unmanned aerial vehicles (sUAVs) has provided additional high-resolution imaging capabilities with low cost for mapping a relatively large area of interest. These cutting-edge technologies have caught the US Army and Navy's attention for the purpose of rapid 3D battlefield reconstruction, virtual training, and simulations. Our previous works have demonstrated the importance of information extraction from the derived photogrammetric data to create semantic-rich virtual environments (Chen et al., 2019). For example, an increase of simulation realism and fidelity was achieved by segmenting and replacing photogrammetric trees with game-ready tree models. In this work, we further investigated the semantic information extraction problem and focused on the ground material segmentation and object detection tasks. The main innovation of this work was that we leveraged both the original 2D images and the derived 3D photogrammetric data to overcome the challenges faced when using each individual data source. For ground material segmentation, we utilized an existing convolutional neural network architecture (i.e., 3DMV) which was originally designed for segmenting RGB-D sensed indoor data. We improved its performance for outdoor photogrammetric data by introducing a depth pooling layer in the architecture to take into consideration the distance between the source images and the reconstructed terrain model. To test the performance of our improved 3DMV, a ground truth ground material database was created using data from the One World Terrain (OWT) data repository. Finally, a workflow for importing the segmented ground materials into a virtual simulation scene was introduced, and visual results are reported in this paper.
△ Less
Submitted 15 October, 2022; v1 submitted 24 September, 2021;
originally announced September 2021.
-
A Novel Solution of an Elastic Net Regularization for Dementia Knowledge Discovery using Deep Learning
Authors:
Kshitiz Shrestha,
Omar Hisham Alsadoon,
Abeer Alsadoon,
Tarik A. Rashid,
Rasha S. Ali,
P. W. C. Prasad,
Oday D. Jerew
Abstract:
Background and Aim: Accurate classification of Magnetic Resonance Images (MRI) is essential to accurately predict Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) conversion. Meanwhile, deep learning has been successfully implemented to classify and predict dementia disease. However, the accuracy of MRI image classification is low. This paper aims to increase the accuracy and reduce the…
▽ More
Background and Aim: Accurate classification of Magnetic Resonance Images (MRI) is essential to accurately predict Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) conversion. Meanwhile, deep learning has been successfully implemented to classify and predict dementia disease. However, the accuracy of MRI image classification is low. This paper aims to increase the accuracy and reduce the processing time of classification through Deep Learning Architecture by using Elastic Net Regularization in Feature Selection. Methodology: The proposed system consists of Convolutional Neural Network (CNN) to enhance the accuracy of classification and prediction by using Elastic Net Regularization. Initially, the MRI images are fed into CNN for features extraction through convolutional layers alternate with pooling layers, and then through a fully connected layer. After that, the features extracted are subjected to Principle Component Analysis (PCA) and Elastic Net Regularization for feature selection. Finally, the selected features are used as an input to Extreme Machine Learning (EML) for the classification of MRI images. Results: The result shows that the accuracy of the proposed solution is better than the current system. In addition to that, the proposed method has improved the classification accuracy by 5% on average and reduced the processing time by 30 ~ 40 seconds on average. Conclusion: The proposed system is focused on improving the accuracy and processing time of MCI converters/non-converters classification. It consists of features extraction, feature selection, and classification using CNN, FreeSurfer, PCA, Elastic Net, Extreme Machine Learning. Finally, this study enhances the accuracy and the processing time by using Elastic Net Regularization, which provides important selected features for classification.
△ Less
Submitted 21 August, 2021;
originally announced September 2021.
-
Deep Learning for Breast Cancer Classification: Enhanced Tangent Function
Authors:
Ashu Thapa,
Abeer Alsadoon,
P. W. C. Prasad,
Simi Bajaj,
Omar Hisham Alsadoon,
Tarik A. Rashid,
Rasha S. Ali,
Oday D. Jerew
Abstract:
Background and Aim: Recently, deep learning using convolutional neural network has been used successfully to classify the images of breast cells accurately. However, the accuracy of manual classification of those histopathological images is comparatively low. This research aims to increase the accuracy of the classification of breast cancer images by utilizing a Patch-Based Classifier (PBC) along…
▽ More
Background and Aim: Recently, deep learning using convolutional neural network has been used successfully to classify the images of breast cells accurately. However, the accuracy of manual classification of those histopathological images is comparatively low. This research aims to increase the accuracy of the classification of breast cancer images by utilizing a Patch-Based Classifier (PBC) along with deep learning architecture. Methodology: The proposed system consists of a Deep Convolutional Neural Network (DCNN) that helps in enhancing and increasing the accuracy of the classification process. This is done by the use of the Patch-based Classifier (PBC). CNN has completely different layers where images are first fed through convolutional layers using hyperbolic tangent function together with the max-pooling layer, drop out layers, and SoftMax function for classification. Further, the output obtained is fed to a patch-based classifier that consists of patch-wise classification output followed by majority voting. Results: The results are obtained throughout the classification stage for breast cancer images that are collected from breast-histology datasets. The proposed solution improves the accuracy of classification whether or not the images had normal, benign, in-situ, or invasive carcinoma from 87% to 94% with a decrease in processing time from 0.45 s to 0.2s on average. Conclusion: The proposed solution focused on increasing the accuracy of classifying cancer in the breast by enhancing the image contrast and reducing the vanishing gradient. Finally, this solution for the implementation of the Contrast Limited Adaptive Histogram Equalization (CLAHE) technique and modified tangent function helps in increasing the accuracy.
△ Less
Submitted 1 July, 2021;
originally announced August 2021.
-
A Review-based Taxonomy for Secure Health Care Monitoring: Wireless Smart Cameras
Authors:
Ravi Teja Batchu,
Abeer Alsadoon,
P. W. C. Prasad,
Rasha S. Ali,
Tarik A. Rashid,
Ghossoon Alsadoon,
Oday D. Jerew
Abstract:
Health records data security is one of the main challenges in e-health systems. Authentication is one of the essential security services to support the stored data confidentiality, integrity, and availability. This research focuses on the secure storage of patient and medical records in the healthcare sector where data security and unauthorized access is an ongoing issue. A potential solution come…
▽ More
Health records data security is one of the main challenges in e-health systems. Authentication is one of the essential security services to support the stored data confidentiality, integrity, and availability. This research focuses on the secure storage of patient and medical records in the healthcare sector where data security and unauthorized access is an ongoing issue. A potential solution comes from biometrics, although their use may be time-consuming and can slow down data retrieval. This research aims to overcome these challenges and enhance data access control in the healthcare sector through the addition of biometrics in the form of fingerprints. The proposed model for application in the healthcare sector consists of Collection, Network communication, and Authentication (CNA) using biometrics, which replaces an existing password-based access control method. A sensor then collects data and by using a network (wireless or Zig-bee), a connection is established, after connectivity analytics and data management work which processes and aggregate the data. Subsequently, access is granted to authenticated users of the application. This IoT-based biometric authentication system facilitates effective recognition and ensures confidentiality, integrity, and reliability of patients, records and other sensitive data. The proposed solution provides reliable access to healthcare data and enables secure access through the process of user and device authentication. The proposed model has been developed for access control to data through the authentication of users in healthcare to reduce data manipulation or theft.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
A Novel Solution of Using Mixed Reality in Bowel and Oral and Maxillofacial Surgical Telepresence: 3D Mean Value Cloning algorithm
Authors:
Arjina Maharjan,
Abeer Alsadoon,
P. W. C. Prasad,
Nada AlSallami,
Tarik A. Rashid,
Ahmad Alrubaie,
Sami Haddad
Abstract:
Background and aim: Most of the Mixed Reality models used in the surgical telepresence are suffering from discrepancies in the boundary area and spatial-temporal inconsistency due to the illumination variation in the video frames. The aim behind this work is to propose a new solution that helps produce the composite video by merging the augmented video of the surgery site and the virtual hand of t…
▽ More
Background and aim: Most of the Mixed Reality models used in the surgical telepresence are suffering from discrepancies in the boundary area and spatial-temporal inconsistency due to the illumination variation in the video frames. The aim behind this work is to propose a new solution that helps produce the composite video by merging the augmented video of the surgery site and the virtual hand of the remote expertise surgeon. The purpose of the proposed solution is to decrease the processing time and enhance the accuracy of merged video by decreasing the overlay and visualization error and removing occlusion and artefacts. Methodology: The proposed system enhanced the mean value cloning algorithm that helps to maintain the spatial-temporal consistency of the final composite video. The enhanced algorithm includes the 3D mean value coordinates and improvised mean value interpolant in the image cloning process, which helps to reduce the sawtooth, smudging and discolouration artefacts around the blending region. Results: As compared to the state of the art solution, the accuracy in terms of overlay error of the proposed solution is improved from 1.01mm to 0.80mm whereas the accuracy in terms of visualization error is improved from 98.8% to 99.4%. The processing time is reduced to 0.173 seconds from 0.211 seconds. Conclusion: Our solution helps make the object of interest consistent with the light intensity of the target image by adding the space distance that helps maintain the spatial consistency in the final merged video.
△ Less
Submitted 17 March, 2021;
originally announced April 2021.
-
Deep Learning for Vision-Based Fall Detection System: Enhanced Optical Dynamic Flow
Authors:
Sagar Chhetri,
Abeer Alsadoon,
Thair Al Dala in,
P. W. C. Prasad,
Tarik A. Rashid,
Angelika Maag
Abstract:
Accurate fall detection for the assistance of older people is crucial to reduce incidents of deaths or injuries due to falls. Meanwhile, a vision-based fall detection system has shown some significant results to detect falls. Still, numerous challenges need to be resolved. The impact of deep learning has changed the landscape of the vision-based system, such as action recognition. The deep learnin…
▽ More
Accurate fall detection for the assistance of older people is crucial to reduce incidents of deaths or injuries due to falls. Meanwhile, a vision-based fall detection system has shown some significant results to detect falls. Still, numerous challenges need to be resolved. The impact of deep learning has changed the landscape of the vision-based system, such as action recognition. The deep learning technique has not been successfully implemented in vision-based fall detection systems due to the requirement of a large amount of computation power and the requirement of a large amount of sample training data. This research aims to propose a vision-based fall detection system that improves the accuracy of fall detection in some complex environments such as the change of light condition in the room. Also, this research aims to increase the performance of the pre-processing of video images. The proposed system consists of the Enhanced Dynamic Optical Flow technique that encodes the temporal data of optical flow videos by the method of rank pooling, which thereby improves the processing time of fall detection and improves the classification accuracy in dynamic lighting conditions. The experimental results showed that the classification accuracy of the fall detection improved by around 3% and the processing time by 40 to 50ms. The proposed system concentrates on decreasing the processing time of fall detection and improving classification accuracy. Meanwhile, it provides a mechanism for summarizing a video into a single image by using a dynamic optical flow technique, which helps to increase the performance of image pre-processing steps.
△ Less
Submitted 18 March, 2021;
originally announced April 2021.
-
A Novel Visualization System of Using Augmented Reality in Knee Replacement Surgery: Enhanced Bidirectional Maximum Correntropy Algorithm
Authors:
Nitish Maharjan,
Abeer Alsadoon,
P. W. C. Prasad,
Salma Abdullah,
Tarik A. Rashid
Abstract:
Background and aim: Image registration and alignment are the main limitations of augmented reality-based knee replacement surgery. This research aims to decrease the registration error, eliminate outcomes that are trapped in local minima to improve the alignment problems, handle the occlusion, and maximize the overlapping parts. Methodology: markerless image registration method was used for Augmen…
▽ More
Background and aim: Image registration and alignment are the main limitations of augmented reality-based knee replacement surgery. This research aims to decrease the registration error, eliminate outcomes that are trapped in local minima to improve the alignment problems, handle the occlusion, and maximize the overlapping parts. Methodology: markerless image registration method was used for Augmented reality-based knee replacement surgery to guide and visualize the surgical operation. While weight least square algorithm was used to enhance stereo camera-based tracking by filling border occlusion in right to left direction and non-border occlusion from left to right direction. Results: This study has improved video precision to 0.57 mm~0.61 mm alignment error. Furthermore, with the use of bidirectional points, for example, forwards and backwards directional cloud point, the iteration on image registration was decreased. This has led to improve the processing time as well. The processing time of video frames was improved to 7.4~11.74 fps. Conclusions: It seems clear that this proposed system has focused on overcoming the misalignment difficulty caused by movement of patient and enhancing the AR visualization during knee replacement surgery. The proposed system was reliable and favorable which helps in eliminating alignment error by ascertaining the optimal rigid transformation between two cloud points and removing the outliers and non-Gaussian noise. The proposed augmented reality system helps in accurate visualization and navigation of anatomy of knee such as femur, tibia, cartilage, blood vessels, etc.
△ Less
Submitted 13 March, 2021;
originally announced April 2021.
-
Leveraging Adaptive Color Augmentation in Convolutional Neural Networks for Deep Skin Lesion Segmentation
Authors:
Anindo Saha,
Prem Prasad,
Abdullah Thabit
Abstract:
Fully automatic detection of skin lesions in dermatoscopic images can facilitate early diagnosis and repression of malignant melanoma and non-melanoma skin cancer. Although convolutional neural networks are a powerful solution, they are limited by the illumination spectrum of annotated dermatoscopic screening images, where color is an important discriminative feature. In this paper, we propose an…
▽ More
Fully automatic detection of skin lesions in dermatoscopic images can facilitate early diagnosis and repression of malignant melanoma and non-melanoma skin cancer. Although convolutional neural networks are a powerful solution, they are limited by the illumination spectrum of annotated dermatoscopic screening images, where color is an important discriminative feature. In this paper, we propose an adaptive color augmentation technique to amplify data expression and model performance, while regulating color difference and saturation to minimize the risks of using synthetic data. Through deep visualization, we qualitatively identify and verify the semantic structural features learned by the network for discriminating skin lesions against normal skin tissue. The overall system achieves a Dice Ratio of 0.891 with 0.943 sensitivity and 0.932 specificity on the ISIC 2018 Testing Set for segmentation.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
Semantic Segmentation and Data Fusion of Microsoft Bing 3D Cities and Small UAV-based Photogrammetric Data
Authors:
Meida Chen,
Andrew Feng,
Kyle McCullough,
Pratusha Bhuvana Prasad,
Ryan McAlinden,
Lucio Soibelman
Abstract:
With state-of-the-art sensing and photogrammetric techniques, Microsoft Bing Maps team has created over 125 highly detailed 3D cities from 11 different countries that cover hundreds of thousands of square kilometer areas. The 3D city models were created using the photogrammetric technique with high-resolution images that were captured from aircraft-mounted cameras. Such a large 3D city database ha…
▽ More
With state-of-the-art sensing and photogrammetric techniques, Microsoft Bing Maps team has created over 125 highly detailed 3D cities from 11 different countries that cover hundreds of thousands of square kilometer areas. The 3D city models were created using the photogrammetric technique with high-resolution images that were captured from aircraft-mounted cameras. Such a large 3D city database has caught the attention of the US Army for creating virtual simulation environments to support military operations. However, the 3D city models do not have semantic information such as buildings, vegetation, and ground and cannot allow sophisticated user-level and system-level interaction. At I/ITSEC 2019, the authors presented a fully automated data segmentation and object information extraction framework for creating simulation terrain using UAV-based photogrammetric data. This paper discusses the next steps in extending our designed data segmentation framework for segmenting 3D city data. In this study, the authors first investigated the strengths and limitations of the existing framework when applied to the Bing data. The main differences between UAV-based and aircraft-based photogrammetric data are highlighted. The data quality issues in the aircraft-based photogrammetric data, which can negatively affect the segmentation performance, are identified. Based on the findings, a workflow was designed specifically for segmenting Bing data while considering its characteristics. In addition, since the ultimate goal is to combine the use of both small unmanned aerial vehicle (UAV) collected data and the Bing data in a virtual simulation environment, data from these two sources needed to be aligned and registered together. To this end, the authors also proposed a data registration workflow that utilized the traditional iterative closest point (ICP) with the extracted semantic information.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Generating synthetic photogrammetric data for training deep learning based 3D point cloud segmentation models
Authors:
Meida Chen,
Andrew Feng,
Kyle McCullough,
Pratusha Bhuvana Prasad,
Ryan McAlinden,
Lucio Soibelman
Abstract:
At I/ITSEC 2019, the authors presented a fully-automated workflow to segment 3D photogrammetric point-clouds/meshes and extract object information, including individual tree locations and ground materials (Chen et al., 2019). The ultimate goal is to create realistic virtual environments and provide the necessary information for simulation. We tested the generalizability of the previously proposed…
▽ More
At I/ITSEC 2019, the authors presented a fully-automated workflow to segment 3D photogrammetric point-clouds/meshes and extract object information, including individual tree locations and ground materials (Chen et al., 2019). The ultimate goal is to create realistic virtual environments and provide the necessary information for simulation. We tested the generalizability of the previously proposed framework using a database created under the U.S. Army's One World Terrain (OWT) project with a variety of landscapes (i.e., various buildings styles, types of vegetation, and urban density) and different data qualities (i.e., flight altitudes and overlap between images). Although the database is considerably larger than existing databases, it remains unknown whether deep-learning algorithms have truly achieved their full potential in terms of accuracy, as sizable data sets for training and validation are currently lacking. Obtaining large annotated 3D point-cloud databases is time-consuming and labor-intensive, not only from a data annotation perspective in which the data must be manually labeled by well-trained personnel, but also from a raw data collection and processing perspective. Furthermore, it is generally difficult for segmentation models to differentiate objects, such as buildings and tree masses, and these types of scenarios do not always exist in the collected data set. Thus, the objective of this study is to investigate using synthetic photogrammetric data to substitute real-world data in training deep-learning algorithms. We have investigated methods for generating synthetic UAV-based photogrammetric data to provide a sufficiently sized database for training a deep-learning algorithm with the ability to enlarge the data size for scenarios in which deep-learning models have difficulties.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Fully Automated Photogrammetric Data Segmentation and Object Information Extraction Approach for Creating Simulation Terrain
Authors:
Meida Chen,
Andrew Feng,
Kyle McCullough,
Pratusha Bhuvana Prasad,
Ryan McAlinden,
Lucio Soibelman,
Mike Enloe
Abstract:
Our previous works have demonstrated that visually realistic 3D meshes can be automatically reconstructed with low-cost, off-the-shelf unmanned aerial systems (UAS) equipped with capable cameras, and efficient photogrammetric software techniques. However, such generated data do not contain semantic information/features of objects (i.e., man-made objects, vegetation, ground, object materials, etc.)…
▽ More
Our previous works have demonstrated that visually realistic 3D meshes can be automatically reconstructed with low-cost, off-the-shelf unmanned aerial systems (UAS) equipped with capable cameras, and efficient photogrammetric software techniques. However, such generated data do not contain semantic information/features of objects (i.e., man-made objects, vegetation, ground, object materials, etc.) and cannot allow the sophisticated user-level and system-level interaction. Considering the use case of the data in creating realistic virtual environments for training and simulations (i.e., mission planning, rehearsal, threat detection, etc.), segmenting the data and extracting object information are essential tasks. Thus, the objective of this research is to design and develop a fully automated photogrammetric data segmentation and object information extraction framework. To validate the proposed framework, the segmented data and extracted features were used to create virtual environments in the authors previously designed simulation tool i.e., Aerial Terrain Line of Sight Analysis System (ATLAS). The results showed that 3D mesh trees could be replaced with geo-typical 3D tree models using the extracted individual tree locations. The extracted tree features (i.e., color, width, height) are valuable for selecting the appropriate tree species and enhance visual quality. Furthermore, the identified ground material information can be taken into consideration for pathfinding. The shortest path can be computed not only considering the physical distance, but also considering the off-road vehicle performance capabilities on different ground surface materials.
△ Less
Submitted 9 August, 2020;
originally announced August 2020.
-
Automated Diabetic Retinopathy Grading using Deep Convolutional Neural Network
Authors:
Saket S. Chaturvedi,
Kajol Gupta,
Vaishali Ninawe,
Prakash S. Prasad
Abstract:
Diabetic Retinopathy is a global health problem, influences 100 million individuals worldwide, and in the next few decades, these incidences are expected to reach epidemic proportions. Diabetic Retinopathy is a subtle eye disease that can cause sudden, irreversible vision loss. The early-stage Diabetic Retinopathy diagnosis can be challenging for human experts, considering the visual complexity of…
▽ More
Diabetic Retinopathy is a global health problem, influences 100 million individuals worldwide, and in the next few decades, these incidences are expected to reach epidemic proportions. Diabetic Retinopathy is a subtle eye disease that can cause sudden, irreversible vision loss. The early-stage Diabetic Retinopathy diagnosis can be challenging for human experts, considering the visual complexity of fundus photography retinal images. However, Early Stage detection of Diabetic Retinopathy can significantly alter the severe vision loss problem. The competence of computer-aided detection systems to accurately detect the Diabetic Retinopathy had popularized them among researchers. In this study, we have utilized a pre-trained DenseNet121 network with several modifications and trained on APTOS 2019 dataset. The proposed method outperformed other state-of-the-art networks in early-stage detection and achieved 96.51% accuracy in severity grading of Diabetic Retinopathy for multi-label classification and achieved 94.44% accuracy for single-class classification method. Moreover, the precision, recall, f1-score, and quadratic weighted kappa for our network was reported as 86%, 87%, 86%, and 91.96%, respectively. Our proposed architecture is simultaneously very simple, accurate, and efficient concerning computational time and space.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Learning Formation of Physically-Based Face Attributes
Authors:
Ruilong Li,
Karl Bladin,
Yajie Zhao,
Chinmay Chinara,
Owen Ingraham,
Pengda Xiang,
Xinglei Ren,
Pratusha Prasad,
Bipin Kishore,
Jun Xing,
Hao Li
Abstract:
Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering. We aim to maximize the variety of face identities, while increasing the robustness of correspondence between unique components, including m…
▽ More
Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering. We aim to maximize the variety of face identities, while increasing the robustness of correspondence between unique components, including middle-frequency geometry, albedo maps, specular intensity maps and high-frequency displacement details. Our deep learning based generative model learns to correlate albedo and geometry, which ensures the anatomical correctness of the generated assets. We demonstrate potential use of our generative model for novel identity generation, model fitting, interpolation, animation, high fidelity data visualization, and low-to-high resolution data domain transferring. We hope the release of this generative model will encourage further cooperation between all graphics, vision, and data focused professionals while demonstrating the cumulative value of every individual's complete biometric profile.
△ Less
Submitted 23 April, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Advances in Computer-Aided Diagnosis of Diabetic Retinopathy
Authors:
Saket S. Chaturvedi,
Kajol Gupta,
Vaishali Ninawe,
Prakash S. Prasad
Abstract:
Diabetic Retinopathy is a critical health problem influences 100 million individuals worldwide, and these figures are expected to rise, particularly in Asia. Diabetic Retinopathy is a chronic eye disease which can lead to irreversible vision loss. Considering the visual complexity of retinal images, the early-stage diagnosis of Diabetic Retinopathy can be challenging for human experts. However, Ea…
▽ More
Diabetic Retinopathy is a critical health problem influences 100 million individuals worldwide, and these figures are expected to rise, particularly in Asia. Diabetic Retinopathy is a chronic eye disease which can lead to irreversible vision loss. Considering the visual complexity of retinal images, the early-stage diagnosis of Diabetic Retinopathy can be challenging for human experts. However, Early detection of Diabetic Retinopathy can significantly help to avoid permanent vision loss. The capability of computer-aided detection systems to accurately and efficiently detect the diabetic retinopathy had popularized them among researchers. In this review paper, the literature search was conducted on PubMed, Google Scholar, IEEE Explorer with a focus on the computer-aided detection of Diabetic Retinopathy using either of Machine Learning or Deep Learning algorithms. Moreover, this study also explores the typical methodology utilized for the computer-aided diagnosis of Diabetic Retinopathy. This review paper is aimed to direct the researchers about the limitations of current methods and identify the specific areas in the field to boost future research.
△ Less
Submitted 21 September, 2019;
originally announced September 2019.
-
Skin Lesion Analyser: An Efficient Seven-Way Multi-Class Skin Cancer Classification Using MobileNet
Authors:
Saket S. Chaturvedi,
Kajol Gupta,
Prakash. S. Prasad
Abstract:
Skin cancer, a major form of cancer, is a critical public health problem with 123,000 newly diagnosed melanoma cases and between 2 and 3 million non-melanoma cases worldwide each year. The leading cause of skin cancer is high exposure of skin cells to UV radiation, which can damage the DNA inside skin cells leading to uncontrolled growth of skin cells. Skin cancer is primarily diagnosed visually e…
▽ More
Skin cancer, a major form of cancer, is a critical public health problem with 123,000 newly diagnosed melanoma cases and between 2 and 3 million non-melanoma cases worldwide each year. The leading cause of skin cancer is high exposure of skin cells to UV radiation, which can damage the DNA inside skin cells leading to uncontrolled growth of skin cells. Skin cancer is primarily diagnosed visually employing clinical screening, a biopsy, dermoscopic analysis, and histopathological examination. It has been demonstrated that the dermoscopic analysis in the hands of inexperienced dermatologists may cause a reduction in diagnostic accuracy. Early detection and screening of skin cancer have the potential to reduce mortality and morbidity. Previous studies have shown Deep Learning ability to perform better than human experts in several visual recognition tasks. In this paper, we propose an efficient seven-way automated multi-class skin cancer classification system having performance comparable with expert dermatologists. We used a pretrained MobileNet model to train over HAM10000 dataset using transfer learning. The model classifies skin lesion image with a categorical accuracy of 83.1 percent, top2 accuracy of 91.36 percent and top3 accuracy of 95.34 percent. The weighted average of precision, recall, and f1-score were found to be 0.89, 0.83, and 0.83 respectively. The model has been deployed as a web application for public use at (https://saketchaturvedi.github.io). This fast, expansible method holds the potential for substantial clinical impact, including broadening the scope of primary care practice and augmenting clinical decision-making for dermatology specialists.
△ Less
Submitted 27 May, 2020; v1 submitted 7 July, 2019;
originally announced July 2019.
-
Design and Development of Underwater Vehicle: ANAHITA
Authors:
Akash Jain,
Manish Kumar,
Rithvik Patibandla,
Balamurugan R,
Naveen Chandra R,
Abhinav Arora,
Akash K Singh,
Varun Pawar,
Aditya Rai,
Medha Agarwal,
Priank Prasad,
Vandit Sanadhya,
Prateek Yadav,
Inshu Namdev,
Nilay Shah,
Saksham Mittal,
Ayush Gupta,
Naman Agarwal,
Mangal Kothari
Abstract:
Anahita is an autonomous underwater vehicle which is currently being developed by interdisciplinary team of students at Indian Institute of Technology(IIT) Kanpur with aim to provide a platform for research in AUV to undergraduate students. This is the second vehicle which is being designed by AUV-IITK team to participate in 6th NIOT-SAVe competition organized by the National Institute of Ocean Te…
▽ More
Anahita is an autonomous underwater vehicle which is currently being developed by interdisciplinary team of students at Indian Institute of Technology(IIT) Kanpur with aim to provide a platform for research in AUV to undergraduate students. This is the second vehicle which is being designed by AUV-IITK team to participate in 6th NIOT-SAVe competition organized by the National Institute of Ocean Technology, Chennai. The Vehicle has been completely redesigned with the major improvements in modularity and ease of access of all the components, keeping the design very compact and efficient. New advancements in the vehicle include, power distribution system and monitoring system. The sensors include the inertial measurement units (IMU), hydrophone array, a depth sensor, and two RGB cameras. The current vehicle features hot swappable battery pods giving a huge advantage over the previous vehicle, for longer runtime.
△ Less
Submitted 1 October, 2021; v1 submitted 1 March, 2019;
originally announced March 2019.
-
Deep Recurrent Neural Networks for Time Series Prediction
Authors:
Sharat C. Prasad,
Piyush Prasad
Abstract:
Ability of deep networks to extract high level features and of recurrent networks to perform time-series inference have been studied. In view of universality of one hidden layer network at approximating functions under weak constraints, the benefit of multiple layers is to enlarge the space of dynamical systems approximated or, given the space, reduce the number of units required for a certain err…
▽ More
Ability of deep networks to extract high level features and of recurrent networks to perform time-series inference have been studied. In view of universality of one hidden layer network at approximating functions under weak constraints, the benefit of multiple layers is to enlarge the space of dynamical systems approximated or, given the space, reduce the number of units required for a certain error. Traditionally shallow networks with manually engineered features are used, back-propagation extent is limited to one and attempt to choose a large number of hidden units to satisfy the Markov condition is made. In case of Markov models, it has been shown that many systems need to be modeled as higher order. In the present work, we present deep recurrent networks with longer backpropagation through time extent as a solution to modeling systems that are high order and to predicting ahead. We study epileptic seizure suppression electro-stimulator. Extraction of manually engineered complex features and prediction employing them has not allowed small low-power implementations as, to avoid possibility of surgery, extraction of any features that may be required has to be included. In this solution, a recurrent neural network performs both feature extraction and prediction. We prove analytically that adding hidden layers or increasing backpropagation extent increases the rate of decrease of approximation error. A Dynamic Programming (DP) training procedure employing matrix operations is derived. DP and use of matrix operations makes the procedure efficient particularly when using data-parallel computing. The simulation studies show the geometry of the parameter space, that the network learns the temporal structure, that parameters converge while model output displays same dynamic behavior as the system and greater than .99 Average Detection Rate on all real seizure data tried.
△ Less
Submitted 18 December, 2014; v1 submitted 22 July, 2014;
originally announced July 2014.
-
Proactive Web Server Protocol for Complaint Assessment
Authors:
G. Vijay Kumar,
Ravikumar S. Raykundaliya,
Dr. P. Naga Prasad
Abstract:
Vulnerability Discovery with attack Injection security threats are increasing for the server software, when software is developed, the software tested for the functionality. Due to unawareness of software vulnerabilities most of the software before pre-Release the software should be thoroughly tested for not only functionality reliability, but should be tested for the security flows (or) vulnerabi…
▽ More
Vulnerability Discovery with attack Injection security threats are increasing for the server software, when software is developed, the software tested for the functionality. Due to unawareness of software vulnerabilities most of the software before pre-Release the software should be thoroughly tested for not only functionality reliability, but should be tested for the security flows (or) vulnerabilities. The approaches such as fuzzers, Fault injection, vulnerabilities scanners, static vulnerabilities analyzers, Run time prevention mechanisms and software Rejuvenation are identifying the un-patched software which is open for security threats address to solve the problem "security testing". These techniques are useful for generating attacks but cannot be extendable for the new land of attacks. The system called proactive vulnerability attack injection tool is suitable for adding new attacks injection vectors, methods to define new protocol states (or) Specification using the interface of tool includes Network server protocol specification using GUI, Attacks generator, Attack injector, monitoring module at the victim injector, monitoring module at the victim machine and the attacks injection report generation. This tool can address most of the vulnerabilities (or) security flows.
△ Less
Submitted 9 February, 2014;
originally announced February 2014.
-
Minimum Dominating Set for a Point Set in $\IR^2$
Authors:
Ramesh K. Jallu,
Prajwal R. Prasad,
Gautam K. Das
Abstract:
In this article, we consider the problem of computing minimum dominating set for a given set $S$ of $n$ points in $\IR^2$. Here the objective is to find a minimum cardinality subset $S'$ of $S$ such that the union of the unit radius disks centered at the points in $S'$ covers all the points in $S$. We first propose a simple 4-factor and 3-factor approximation algorithms in $O(n^6 \log n)$ and…
▽ More
In this article, we consider the problem of computing minimum dominating set for a given set $S$ of $n$ points in $\IR^2$. Here the objective is to find a minimum cardinality subset $S'$ of $S$ such that the union of the unit radius disks centered at the points in $S'$ covers all the points in $S$. We first propose a simple 4-factor and 3-factor approximation algorithms in $O(n^6 \log n)$ and $O(n^{11} \log n)$ time respectively improving time complexities by a factor of $O(n^2)$ and $O(n^4)$ respectively over the best known result available in the literature [M. De, G.K. Das, P. Carmi and S.C. Nandy, {\it Approximation algorithms for a variant of discrete piercing set problem for unit disk}, Int. J. of Comp. Geom. and Appl., to appear]. Finally, we propose a very important shifting lemma, which is of independent interest and using this lemma we propose a $\frac{5}{2}$-factor approximation algorithm and a PTAS for the minimum dominating set problem.
△ Less
Submitted 5 January, 2014; v1 submitted 27 December, 2013;
originally announced December 2013.
-
A Novel Approach for Authenticating Textual or Graphical Passwords Using Hopfield Neural Network
Authors:
ASN Chakravarthy,
P S Avadhani,
P. E. S. N Krishna Prasad,
N. Rajeevand,
D. Rajasekhar Reddy
Abstract:
Password authentication using Hopfield Networks is presented in this paper. In this paper we discussed the Hopfield Network Scheme for Textual and graphical passwords, for which input Password will be converted in to probabilistic values. We observed how to get password authentication using Probabilistic values for Textual passwords and Graphical passwords. This study proposes the use of a Hopfiel…
▽ More
Password authentication using Hopfield Networks is presented in this paper. In this paper we discussed the Hopfield Network Scheme for Textual and graphical passwords, for which input Password will be converted in to probabilistic values. We observed how to get password authentication using Probabilistic values for Textual passwords and Graphical passwords. This study proposes the use of a Hopfield neural network technique for password authentication. In comparison to existing layered neural network techniques, the proposed method provides better accuracy and quicker response time to registration and password changes.
△ Less
Submitted 5 August, 2011;
originally announced August 2011.
-
A High Speed Networked Signal Processing Platform for Multi-element Radio Telescopes
Authors:
Peeyush Prasad,
C. R. Subrahmanya
Abstract:
A new architecture is presented for a Networked Signal Processing System (NSPS) suitable for handling the real-time signal processing of multi-element radio telescopes. In this system, a multi-element radio telescope is viewed as an application of a multi-sensor, data fusion problem which can be decomposed into a general set of computing and network components for which a practical and scalable ar…
▽ More
A new architecture is presented for a Networked Signal Processing System (NSPS) suitable for handling the real-time signal processing of multi-element radio telescopes. In this system, a multi-element radio telescope is viewed as an application of a multi-sensor, data fusion problem which can be decomposed into a general set of computing and network components for which a practical and scalable architecture is enabled by current technology. The need for such a system arose in the context of an ongoing program for reconfiguring the Ooty Radio Telescope (ORT) as a programmable 264-element array, which will enable several new observing capabilities for large scale surveys on this mature telescope. For this application, it is necessary to manage, route and combine large volumes of data whose real-time collation requires large I/O bandwidths to be sustained. Since these are general requirements of many multi-sensor fusion applications, we first describe the basic architecture of the NSPS in terms of a Fusion Tree before elaborating on its application for the ORT. The paper addresses issues relating to high speed distributed data acquisition, Field Programmable Gate Array (FPGA) based peer-to-peer networks supporting significant on-the fly processing while routing, and providing a last mile interface to a typical commodity network like Gigabit Ethernet. The system is fundamentally a pair of two co-operative networks, among which one is part of a commodity high performance computer cluster and the other is based on Commercial-Off The-Shelf (COTS) technology with support from software/firmware components in the public domain.
△ Less
Submitted 1 February, 2011;
originally announced February 2011.