-
HMSViT: A Hierarchical Masked Self-Supervised Vision Transformer for Corneal Nerve Segmentation and Diabetic Neuropathy Diagnosis
Authors:
Xin Zhang,
Liangxiu Han,
Yue Shi,
Yanlin Zheng,
Uazman Alam,
Maryam Ferdousi,
Rayaz Malik
Abstract:
Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection. Corneal Confocal Microscopy (CCM) enables non-invasive diagnosis, but automated methods suffer from inefficient feature extraction, reliance on handcrafted priors, and data limitations. We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT) designed for corn…
▽ More
Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection. Corneal Confocal Microscopy (CCM) enables non-invasive diagnosis, but automated methods suffer from inefficient feature extraction, reliance on handcrafted priors, and data limitations. We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT) designed for corneal nerve segmentation and DPN diagnosis. Unlike existing methods, HMSViT employs pooling-based hierarchical and dual attention mechanisms with absolute positional encoding, enabling efficient multi-scale feature extraction by capturing fine-grained local details in early layers and integrating global context in deeper layers, all at a lower computational cost. A block-masked self supervised learning framework is designed for the HMSViT that reduces reliance on labelled data, enhancing feature robustness, while a multi-scale decoder is used for segmentation and classification by fusing hierarchical features. Experiments on clinical CCM datasets showed HMSViT achieves state-of-the-art performance, with 61.34% mIoU for nerve segmentation and 70.40% diagnostic accuracy, outperforming leading hierarchical models like the Swin Transformer and HiViT by margins of up to 6.39% in segmentation accuracy while using fewer parameters. Detailed ablation studies further reveal that integrating block-masked SSL with hierarchical multi-scale feature extraction substantially enhances performance compared to conventional supervised training. Overall, these comprehensive experiments confirm that HMSViT delivers excellent, robust, and clinically viable results, demonstrating its potential for scalable deployment in real-world diagnostic applications.
△ Less
Submitted 30 June, 2025; v1 submitted 24 June, 2025;
originally announced June 2025.
-
Open-Ended and Knowledge-Intensive Video Question Answering
Authors:
Md Zarif Ul Alam,
Hamed Zamani
Abstract:
Video question answering that requires external knowledge beyond the visual content remains a significant challenge in AI systems. While models can effectively answer questions based on direct visual observations, they often falter when faced with questions requiring broader contextual knowledge. To address this limitation, we investigate knowledge-intensive video question answering (KI-VideoQA) t…
▽ More
Video question answering that requires external knowledge beyond the visual content remains a significant challenge in AI systems. While models can effectively answer questions based on direct visual observations, they often falter when faced with questions requiring broader contextual knowledge. To address this limitation, we investigate knowledge-intensive video question answering (KI-VideoQA) through the lens of multi-modal retrieval-augmented generation, with a particular focus on handling open-ended questions rather than just multiple-choice formats. Our comprehensive analysis examines various retrieval augmentation approaches using cutting-edge retrieval and vision language models, testing both zero-shot and fine-tuned configurations. We investigate several critical dimensions: the interplay between different information sources and modalities, strategies for integrating diverse multi-modal contexts, and the dynamics between query formulation and retrieval result utilization. Our findings reveal that while retrieval augmentation shows promise in improving model performance, its success is heavily dependent on the chosen modality and retrieval methodology. The study also highlights the critical role of query construction and retrieval depth optimization in effective knowledge integration. Through our proposed approach, we achieve a substantial 17.5% improvement in accuracy on multiple choice questions in the KnowIT VQA dataset, establishing new state-of-the-art performance levels.
△ Less
Submitted 18 February, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
AI and the Future of Work in Africa White Paper
Authors:
Jacki O'Neill,
Vukosi Marivate,
Barbara Glover,
Winnie Karanu,
Girmaw Abebe Tadesse,
Akua Gyekye,
Anne Makena,
Wesley Rosslyn-Smith,
Matthew Grollnek,
Charity Wayua,
Rehema Baguma,
Angel Maduke,
Sarah Spencer,
Daniel Kandie,
Dennis Ndege Maari,
Natasha Mutangana,
Maxamed Axmed,
Nyambura Kamau,
Muhammad Adamu,
Frank Swaniker,
Brian Gatuguti,
Jonathan Donner,
Mark Graham,
Janet Mumo,
Caroline Mbindyo
, et al. (50 additional authors not shown)
Abstract:
This white paper is the output of a multidisciplinary workshop in Nairobi (Nov 2023). Led by a cross-organisational team including Microsoft Research, NEPAD, Lelapa AI, and University of Oxford. The workshop brought together diverse thought-leaders from various sectors and backgrounds to discuss the implications of Generative AI for the future of work in Africa. Discussions centred around four key…
▽ More
This white paper is the output of a multidisciplinary workshop in Nairobi (Nov 2023). Led by a cross-organisational team including Microsoft Research, NEPAD, Lelapa AI, and University of Oxford. The workshop brought together diverse thought-leaders from various sectors and backgrounds to discuss the implications of Generative AI for the future of work in Africa. Discussions centred around four key themes: Macroeconomic Impacts; Jobs, Skills and Labour Markets; Workers' Perspectives and Africa-Centris AI Platforms. The white paper provides an overview of the current state and trends of generative AI and its applications in different domains, as well as the challenges and risks associated with its adoption and regulation. It represents a diverse set of perspectives to create a set of insights and recommendations which aim to encourage debate and collaborative action towards creating a dignified future of work for everyone across Africa.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
HydroTrack: Spectroscopic Analysis Prototype Enabling Real-Time Hydration Monitoring in Wearables
Authors:
Nazim A. Belabbaci,
Mohammad Arif Ul Alam
Abstract:
In the rapidly growing field of wearable technology, optical devices are emerging as a significant innovation, offering non-invasive methods for analyzing skin and underlying tissue properties. Despite their promise, progress has been slowed by a lack of specialized prototypes and advanced analysis techniques. Addressing this gap, our study introduces, HydroTrack, an 18-channel spectroscopy sensor…
▽ More
In the rapidly growing field of wearable technology, optical devices are emerging as a significant innovation, offering non-invasive methods for analyzing skin and underlying tissue properties. Despite their promise, progress has been slowed by a lack of specialized prototypes and advanced analysis techniques. Addressing this gap, our study introduces, HydroTrack, an 18-channel spectroscopy sensor, ingeniously embedded in a smart-watch. Accompanying this hardware, we present signal processing and data analysis techniques implemented at the edge, designed to maximize the utility of our system in comprehensive health tracking. A pivotal application of our device is the real-time assessment of hydration levels in physically active individuals. We validated our prototype and analytical approach through experiments on six participants, focusing on hydration dynamics during physical exercises. Our findings reveal an accuracy of avg. 95% in determining hydration states.
△ Less
Submitted 12 June, 2024;
originally announced July 2024.
-
Enhancing Wearable based Real-Time Glucose Monitoring via Phasic Image Representation Learning based Deep Learning
Authors:
Yidong Zhu,
Nadia B Aimandi,
Mohammad Arif Ul Alam
Abstract:
In the U.S., over a third of adults are pre-diabetic, with 80\% unaware of their status. This underlines the need for better glucose monitoring to prevent type 2 diabetes and related heart diseases. Existing wearable glucose monitors are limited by the lack of models trained on small datasets, as collecting extensive glucose data is often costly and impractical. Our study introduces a novel machin…
▽ More
In the U.S., over a third of adults are pre-diabetic, with 80\% unaware of their status. This underlines the need for better glucose monitoring to prevent type 2 diabetes and related heart diseases. Existing wearable glucose monitors are limited by the lack of models trained on small datasets, as collecting extensive glucose data is often costly and impractical. Our study introduces a novel machine learning method using modified recurrence plots in the frequency domain to improve glucose level prediction accuracy from wearable device data, even with limited datasets. This technique combines advanced signal processing with machine learning to extract more meaningful features. We tested our method against existing models using historical data, showing that our approach surpasses the current 87\% accuracy benchmark in predicting real-time interstitial glucose levels.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Connecting the Dots: Leveraging Spatio-Temporal Graph Neural Networks for Accurate Bangla Sign Language Recognition
Authors:
Haz Sameen Shahgir,
Khondker Salman Sayeed,
Md Toki Tahmid,
Tanjeem Azwad Zaman,
Md. Zarif Ul Alam
Abstract:
Recent advances in Deep Learning and Computer Vision have been successfully leveraged to serve marginalized communities in various contexts. One such area is Sign Language - a primary means of communication for the deaf community. However, so far, the bulk of research efforts and investments have gone into American Sign Language, and research activity into low-resource sign languages - especially…
▽ More
Recent advances in Deep Learning and Computer Vision have been successfully leveraged to serve marginalized communities in various contexts. One such area is Sign Language - a primary means of communication for the deaf community. However, so far, the bulk of research efforts and investments have gone into American Sign Language, and research activity into low-resource sign languages - especially Bangla Sign Language - has lagged significantly. In this research paper, we present a new word-level Bangla Sign Language dataset - BdSL40 - consisting of 611 videos over 40 words, along with two different approaches: one with a 3D Convolutional Neural Network model and another with a novel Graph Neural Network approach for the classification of BdSL40 dataset. This is the first study on word-level BdSL recognition, and the dataset was transcribed from Indian Sign Language (ISL) using the Bangla Sign Language Dictionary (1997). The proposed GNN model achieved an F1 score of 89%. The study highlights the significant lexical and semantic similarity between BdSL, West Bengal Sign Language, and ISL, and the lack of word-level datasets for BdSL in the literature. We release the dataset and source code to stimulate further research.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition
Authors:
Md. Zarif Ul Alam,
Md Saiful Islam,
Ehsan Hoque,
M Saifur Rahman
Abstract:
Parkinson's disease (PD) is a neuro-degenerative disorder that affects movement, speech, and coordination. Timely diagnosis and treatment can improve the quality of life for PD patients. However, access to clinical diagnosis is limited in low and middle income countries (LMICs). Therefore, development of automated screening tools for PD can have a huge social impact, particularly in the public hea…
▽ More
Parkinson's disease (PD) is a neuro-degenerative disorder that affects movement, speech, and coordination. Timely diagnosis and treatment can improve the quality of life for PD patients. However, access to clinical diagnosis is limited in low and middle income countries (LMICs). Therefore, development of automated screening tools for PD can have a huge social impact, particularly in the public health sector. In this paper, we present PULSAR, a novel method to screen for PD from webcam-recorded videos of the finger-tapping task from the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS). PULSAR is trained and evaluated on data collected from 382 participants (183 self-reported as PD patients). We used an adaptive graph convolutional neural network to dynamically learn the spatio temporal graph edges specific to the finger-tapping task. We enhanced this idea with a multi stream adaptive convolution model to learn features from different modalities of data critical to detect PD, such as relative location of the finger joints, velocity and acceleration of tapping. As the labels of the videos are self-reported, there could be cases of undiagnosed PD in the non-PD labeled samples. We leveraged the idea of Positive Unlabeled (PU) Learning that does not need labeled negative data. Our experiments show clear benefit of modeling the problem in this way. PULSAR achieved 80.95% accuracy in validation set and a mean accuracy of 71.29% (2.49% standard deviation) in independent test, despite being trained with limited amount of data. This is specially promising as labeled data is scarce in health care sector. We hope PULSAR will make PD screening more accessible to everyone. The proposed techniques could be extended for assessment of other movement disorders, such as ataxia, and Huntington's disease.
△ Less
Submitted 16 February, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech Audio
Authors:
Forsad Al Hossain,
Tanjid Hasan Tonmoy,
Andrew A. Lover,
George A. Corey,
Mohammad Arif Ul Alam,
Tauhidur Rahman
Abstract:
Privacy-preserving crowd density analysis finds application across a wide range of scenarios, substantially enhancing smart building operation and management while upholding privacy expectations in various spaces. We propose a non-speech audio-based approach for crowd analytics, leveraging a transformer-based model. Our results demonstrate that non-speech audio alone can be used to conduct such an…
▽ More
Privacy-preserving crowd density analysis finds application across a wide range of scenarios, substantially enhancing smart building operation and management while upholding privacy expectations in various spaces. We propose a non-speech audio-based approach for crowd analytics, leveraging a transformer-based model. Our results demonstrate that non-speech audio alone can be used to conduct such analysis with remarkable accuracy. To the best of our knowledge, this is the first time when non-speech audio signals are proposed for predicting occupancy. As far as we know, there has been no other similar approach of its kind prior to this. To accomplish this, we deployed our sensor-based platform in the waiting room of a large hospital with IRB approval over a period of several months to capture non-speech audio and thermal images for the training and evaluation of our models. The proposed non-speech-based approach outperformed the thermal camera-based model and all other baselines. In addition to demonstrating superior performance without utilizing speech audio, we conduct further analysis using differential privacy techniques to provide additional privacy guarantees. Overall, our work demonstrates the viability of employing non-speech audio data for accurate occupancy estimation, while also ensuring the exclusion of speech-related content and providing robust privacy protections through differential privacy guarantees.
△ Less
Submitted 20 September, 2023; v1 submitted 18 September, 2023;
originally announced September 2023.
-
SHAMSUL: Systematic Holistic Analysis to investigate Medical Significance Utilizing Local interpretability methods in deep learning for chest radiography pathology prediction
Authors:
Mahbub Ul Alam,
Jaakko Hollmén,
Jón Rúnar Baldvinsson,
Rahim Rahmani
Abstract:
The interpretability of deep neural networks has become a subject of great interest within the medical and healthcare domain. This attention stems from concerns regarding transparency, legal and ethical considerations, and the medical significance of predictions generated by these deep neural networks in clinical decision support systems. To address this matter, our study delves into the applicati…
▽ More
The interpretability of deep neural networks has become a subject of great interest within the medical and healthcare domain. This attention stems from concerns regarding transparency, legal and ethical considerations, and the medical significance of predictions generated by these deep neural networks in clinical decision support systems. To address this matter, our study delves into the application of four well-established interpretability methods: Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP), Gradient-weighted Class Activation Mapping (Grad-CAM), and Layer-wise Relevance Propagation (LRP). Leveraging the approach of transfer learning with a multi-label-multi-class chest radiography dataset, we aim to interpret predictions pertaining to specific pathology classes. Our analysis encompasses both single-label and multi-label predictions, providing a comprehensive and unbiased assessment through quantitative and qualitative investigations, which are compared against human expert annotation. Notably, Grad-CAM demonstrates the most favorable performance in quantitative evaluation, while the LIME heatmap score segmentation visualization exhibits the highest level of medical significance. Our research underscores both the outcomes and the challenges faced in the holistic approach adopted for assessing these interpretability methods and suggests that a multimodal-based approach, incorporating diverse sources of information beyond chest radiography images, could offer additional insights for enhancing interpretability in the medical domain.
△ Less
Submitted 17 November, 2023; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Wearable-based Fair and Accurate Pain Assessment Using Multi-Attribute Fairness Loss in Convolutional Neural Networks
Authors:
Yidong Zhu,
Shao-Hsien Liu,
Mohammad Arif Ul Alam
Abstract:
The integration of diverse health data, such as IoT (Internet of Things), EHR (Electronic Health Record), and clinical surveys, with scalable AI(Artificial Intelligence) has enabled the identification of physical, behavioral, and psycho-social indicators of pain. However, the adoption of AI in clinical pain evaluation is hindered by challenges like personalization and fairness. Many AI models, inc…
▽ More
The integration of diverse health data, such as IoT (Internet of Things), EHR (Electronic Health Record), and clinical surveys, with scalable AI(Artificial Intelligence) has enabled the identification of physical, behavioral, and psycho-social indicators of pain. However, the adoption of AI in clinical pain evaluation is hindered by challenges like personalization and fairness. Many AI models, including machine and deep learning, exhibit biases, discriminating against specific groups based on gender or ethnicity, causing skepticism among medical professionals about their reliability. This paper proposes a Multi-attribute Fairness Loss (MAFL) based Convolutional Neural Network (CNN) model designed to account for protected attributes in data, ensuring fair pain status predictions while minimizing disparities between privileged and unprivileged groups. We evaluate whether a balance between accuracy and fairness is achievable by comparing the proposed model with existing mitigation methods. Our findings indicate that the model performs favorably against state-of-the-art techniques. Using the NIH All-Of-US dataset, comprising data from 868 individuals over 1500 days, we demonstrate our model's effectiveness, achieving accuracy rates between 75% and 85%.
△ Less
Submitted 16 February, 2025; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Internet of Things Fault Detection and Classification via Multitask Learning
Authors:
Mohammad Arif Ul Alam
Abstract:
This paper presents a comprehensive investigation into developing a fault detection and classification system for real-world IIoT applications. The study addresses challenges in data collection, annotation, algorithm development, and deployment. Using a real-world IIoT system, three phases of data collection simulate 11 predefined fault categories. We propose SMTCNN for fault detection and categor…
▽ More
This paper presents a comprehensive investigation into developing a fault detection and classification system for real-world IIoT applications. The study addresses challenges in data collection, annotation, algorithm development, and deployment. Using a real-world IIoT system, three phases of data collection simulate 11 predefined fault categories. We propose SMTCNN for fault detection and category classification in IIoT, evaluating its performance on real-world data. SMTCNN achieves superior specificity (3.5%) and shows significant improvements in precision, recall, and F1 measures compared to existing techniques.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Augmenting Deep Learning Adaptation for Wearable Sensor Data through Combined Temporal-Frequency Image Encoding
Authors:
Yidong Zhu,
Md Mahmudur Rahman,
Mohammad Arif Ul Alam
Abstract:
Deep learning advancements have revolutionized scalable classification in many domains including computer vision. However, when it comes to wearable-based classification and domain adaptation, existing computer vision-based deep learning architectures and pretrained models trained on thousands of labeled images for months fall short. This is primarily because wearable sensor data necessitates sens…
▽ More
Deep learning advancements have revolutionized scalable classification in many domains including computer vision. However, when it comes to wearable-based classification and domain adaptation, existing computer vision-based deep learning architectures and pretrained models trained on thousands of labeled images for months fall short. This is primarily because wearable sensor data necessitates sensor-specific preprocessing, architectural modification, and extensive data collection. To overcome these challenges, researchers have proposed encoding of wearable temporal sensor data in images using recurrent plots. In this paper, we present a novel modified-recurrent plot-based image representation that seamlessly integrates both temporal and frequency domain information. Our approach incorporates an efficient Fourier transform-based frequency domain angular difference estimation scheme in conjunction with the existing temporal recurrent plot image. Furthermore, we employ mixup image augmentation to enhance the representation. We evaluate the proposed method using accelerometer-based activity recognition data and a pretrained ResNet model, and demonstrate its superior performance compared to existing approaches.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
BanglaCoNER: Towards Robust Bangla Complex Named Entity Recognition
Authors:
HAZ Sameen Shahgir,
Ramisa Alam,
Md. Zarif Ul Alam
Abstract:
Named Entity Recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying named entities in text. But much work hasn't been done for complex named entity recognition in Bangla, despite being the seventh most spoken language globally. CNER is a more challenging task than traditional NER as it involves identifying and classifying complex and compou…
▽ More
Named Entity Recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying named entities in text. But much work hasn't been done for complex named entity recognition in Bangla, despite being the seventh most spoken language globally. CNER is a more challenging task than traditional NER as it involves identifying and classifying complex and compound entities, which are not common in Bangla language. In this paper, we present the winning solution of Bangla Complex Named Entity Recognition Challenge - addressing the CNER task on BanglaCoNER dataset using two different approaches, namely Conditional Random Fields (CRF) and finetuning transformer based Deep Learning models such as BanglaBERT.
The dataset consisted of 15300 sentences for training and 800 sentences for validation, in the .conll format. Exploratory Data Analysis (EDA) on the dataset revealed that the dataset had 7 different NER tags, with notable presence of English words, suggesting that the dataset is synthetic and likely a product of translation.
We experimented with a variety of feature combinations including Part of Speech (POS) tags, word suffixes, Gazetteers, and cluster information from embeddings, while also finetuning the BanglaBERT (large) model for NER. We found that not all linguistic patterns are immediately apparent or even intuitive to humans, which is why Deep Learning based models has proved to be the more effective model in NLP, including CNER task. Our fine tuned BanglaBERT (large) model achieves an F1 Score of 0.79 on the validation set. Overall, our study highlights the importance of Bangla Complex Named Entity Recognition, particularly in the context of synthetic datasets. Our findings also demonstrate the efficacy of Deep Learning models such as BanglaBERT for NER in Bangla language.
△ Less
Submitted 17 March, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.
-
PhysioGait: Context-Aware Physiological Context Modeling for Person Re-identification Attack on Wearable Sensing
Authors:
James O Sullivan,
Mohammad Arif Ul Alam
Abstract:
Person re-identification is a critical privacy breach in publicly shared healthcare data. We investigate the possibility of a new type of privacy threat on publicly shared privacy insensitive large scale wearable sensing data. In this paper, we investigate user specific biometric signatures in terms of two contextual biometric traits, physiological (photoplethysmography and electrodermal activity)…
▽ More
Person re-identification is a critical privacy breach in publicly shared healthcare data. We investigate the possibility of a new type of privacy threat on publicly shared privacy insensitive large scale wearable sensing data. In this paper, we investigate user specific biometric signatures in terms of two contextual biometric traits, physiological (photoplethysmography and electrodermal activity) and physical (accelerometer) contexts. In this regard, we propose PhysioGait, a context-aware physiological signal model that consists of a Multi-Modal Siamese Convolutional Neural Network (mmSNN) which learns the spatial and temporal information individually and performs sensor fusion in a Siamese cost with the objective of predicting a person's identity. We evaluated PhysioGait attack model using 4 real-time collected datasets (3-data under IRB #HP-00064387 and one publicly available data) and two combined datasets achieving 89% - 93% accuracy of re-identifying persons.
△ Less
Submitted 29 October, 2022;
originally announced November 2022.
-
Enabling Heterogeneous Domain Adaptation in Multi-inhabitants Smart Home Activity Learning
Authors:
Md Mahmudur Rahman,
Mahta Mousavi,
Peri Tarr,
Mohammad Arif Ul Alam
Abstract:
Domain adaptation for sensor-based activity learning is of utmost importance in remote health monitoring research. However, many domain adaptation algorithms suffer with failure to operate adaptation in presence of target domain heterogeneity (which is always present in reality) and presence of multiple inhabitants dramatically hinders their generalizability producing unsatisfactory results for se…
▽ More
Domain adaptation for sensor-based activity learning is of utmost importance in remote health monitoring research. However, many domain adaptation algorithms suffer with failure to operate adaptation in presence of target domain heterogeneity (which is always present in reality) and presence of multiple inhabitants dramatically hinders their generalizability producing unsatisfactory results for semi-supervised and unseen activity learning tasks. We propose \emph{AEDA}, a novel deep auto-encoder-based model to enable semi-supervised domain adaptation in the existence of target domain heterogeneity and how to incorporate it to empower heterogeneity to any homogeneous deep domain adaptation architecture for cross-domain activity learning. Experimental evaluation on 18 different heterogeneous and multi-inhabitants use-cases of 8 different domains created from 2 publicly available human activity datasets (wearable and ambient smart homes) shows that \emph{AEDA} outperforms (max. 12.8\% and 8.9\% improvements for ambient smart home and wearables) over existing domain adaptation techniques for both seen and unseen activity learning in a heterogeneous setting.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Semi-Supervised Domain Adaptation with Auto-Encoder via Simultaneous Learning
Authors:
Md Mahmudur Rahman,
Rameswar Panda,
Mohammad Arif Ul Alam
Abstract:
We present a new semi-supervised domain adaptation framework that combines a novel auto-encoder-based domain adaptation model with a simultaneous learning scheme providing stable improvements over state-of-the-art domain adaptation models. Our framework holds strong distribution matching property by training both source and target auto-encoders using a novel simultaneous learning scheme on a singl…
▽ More
We present a new semi-supervised domain adaptation framework that combines a novel auto-encoder-based domain adaptation model with a simultaneous learning scheme providing stable improvements over state-of-the-art domain adaptation models. Our framework holds strong distribution matching property by training both source and target auto-encoders using a novel simultaneous learning scheme on a single graph with an optimally modified MMD loss objective function. Additionally, we design a semi-supervised classification approach by transferring the aligned domain invariant feature spaces from source domain to the target domain. We evaluate on three datasets and show proof that our framework can effectively solve both fragile convergence (adversarial) and weak distribution matching problems between source and target feature space (discrepancy) with a high `speed' of adaptation requiring a very low number of iterations.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
College Student Retention Risk Analysis From Educational Database using Multi-Task Multi-Modal Neural Fusion
Authors:
Mohammad Arif Ul Alam
Abstract:
We develop a Multimodal Spatiotemporal Neural Fusion network for Multi-Task Learning (MSNF-MTCL) to predict 5 important students' retention risks: future dropout, next semester dropout, type of dropout, duration of dropout and cause of dropout. First, we develop a general purpose multi-modal neural fusion network model MSNF for learning students' academic information representation by fusing spati…
▽ More
We develop a Multimodal Spatiotemporal Neural Fusion network for Multi-Task Learning (MSNF-MTCL) to predict 5 important students' retention risks: future dropout, next semester dropout, type of dropout, duration of dropout and cause of dropout. First, we develop a general purpose multi-modal neural fusion network model MSNF for learning students' academic information representation by fusing spatial and temporal unstructured advising notes with spatiotemporal structured data. MSNF combines a Bidirectional Encoder Representations from Transformers (BERT)-based document embedding framework to represent each advising note, Long-Short Term Memory (LSTM) network to model temporal advising note embeddings, LSTM network to model students' temporal performance variables and students' static demographics altogether. The final fused representation from MSNF has been utilized on a Multi-Task Cascade Learning (MTCL) model towards building MSNF-MTCL for predicting 5 student retention risks. We evaluate MSNFMTCL on a large educational database consists of 36,445 college students over 18 years period of time that provides promising performances comparing with the nearest state-of-art models. Additionally, we test the fairness of such model given the existence of biases.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
PALMAR: Towards Adaptive Multi-inhabitant Activity Recognition in Point-Cloud Technology
Authors:
Mohammad Arif Ul Alam,
Md Mahmudur Rahman,
Jared Q Widberg
Abstract:
With the advancement of deep neural networks and computer vision-based Human Activity Recognition, employment of Point-Cloud Data technologies (LiDAR, mmWave) has seen a lot interests due to its privacy preserving nature. Given the high promise of accurate PCD technologies, we develop, PALMAR, a multiple-inhabitant activity recognition system by employing efficient signal processing and novel mach…
▽ More
With the advancement of deep neural networks and computer vision-based Human Activity Recognition, employment of Point-Cloud Data technologies (LiDAR, mmWave) has seen a lot interests due to its privacy preserving nature. Given the high promise of accurate PCD technologies, we develop, PALMAR, a multiple-inhabitant activity recognition system by employing efficient signal processing and novel machine learning techniques to track individual person towards developing an adaptive multi-inhabitant tracking and HAR system. More specifically, we propose (i) a voxelized feature representation-based real-time PCD fine-tuning method, (ii) efficient clustering (DBSCAN and BIRCH), Adaptive Order Hidden Markov Model based multi-person tracking and crossover ambiguity reduction techniques and (iii) novel adaptive deep learning-based domain adaptation technique to improve the accuracy of HAR in presence of data scarcity and diversity (device, location and population diversity). We experimentally evaluate our framework and systems using (i) a real-time PCD collected by three devices (3D LiDAR and 79 GHz mmWave) from 6 participants, (ii) one publicly available 3D LiDAR activity data (28 participants) and (iii) an embedded hardware prototype system which provided promising HAR performances in multi-inhabitants (96%) scenario with a 63% improvement of multi-person tracking than state-of-art framework without losing significant system performances in the edge computing device.
△ Less
Submitted 2 December, 2022; v1 submitted 22 June, 2021;
originally announced June 2021.
-
Person Re-identification Attack on Wearable Sensing
Authors:
Mohammad Arif Ul Alam
Abstract:
Person re-identification is a critical privacy attack in publicly shared healthcare data as per Health Insurance Portability and Accountability Act (HIPAA) privacy rule. In this paper, we investigate the possibility of a new type of privacy attack, Person Re-identification Attack (PRI-attack) on publicly shared privacy insensitive wearable data. We investigate user's specific biometric signature i…
▽ More
Person re-identification is a critical privacy attack in publicly shared healthcare data as per Health Insurance Portability and Accountability Act (HIPAA) privacy rule. In this paper, we investigate the possibility of a new type of privacy attack, Person Re-identification Attack (PRI-attack) on publicly shared privacy insensitive wearable data. We investigate user's specific biometric signature in terms of two contextual biometric traits, physiological (photoplethysmography and electrodermal activity) and physical (accelerometer) contexts. In this regard, we develop a Multi-Modal Siamese Convolutional Neural Network (mmSNN) model. The framework learns the spatial and temporal information individually and combines them together in a modified weighted cost with an objective of predicting a person's identity. We evaluated our proposed model using real-time collected data from 3 collected datasets and one publicly available dataset. Our proposed framework shows that PPG-based breathing rate and heart rate in conjunction with hand gesture contexts can be utilized by attackers to re-identify user's identity (max. 71%) from HIPAA compliant wearable data. Given publicly placed camera can estimate heart rate and breathing rate along with hand gestures remotely, person re-identification using them imposes a significant threat to future HIPAA compliant server which requires a better encryption method to store wearable healthcare data.
△ Less
Submitted 22 June, 2021;
originally announced June 2021.
-
Estimating Heterogeneous Causal Effect of Polysubstance Usage on Drug Overdose from Large-Scale Electronic Health Record
Authors:
Vaishali Mahipal,
Mohammad Arif Ul Alam
Abstract:
Drug overdose has become a public health crisis in the United States with devastating consequences. However, most of the drug overdose incidences are the consequence of recitative polysubstance usage over a defined period of time which can be happened by either the intentional usage of required drug with other drugs or by accident. Thus, predicting the effects of polysubstance usage is extremely i…
▽ More
Drug overdose has become a public health crisis in the United States with devastating consequences. However, most of the drug overdose incidences are the consequence of recitative polysubstance usage over a defined period of time which can be happened by either the intentional usage of required drug with other drugs or by accident. Thus, predicting the effects of polysubstance usage is extremely important for clinicians to decide which combination of drugs should be prescribed. Recent advancement of structural causal models can provide ample insights of causal effects from observational data via identifiable causal directed graphs. In this paper, we propose a system to estimate heterogeneous concurrent drug usage effects on overdose estimation, that consists of efficient co-variate selection, sub-group selection and heterogeneous causal effect estimation. We apply our framework to answer a critical question, can concurrent usage of benzodiazepines and opioids have heterogeneous causal effects on the opioid overdose epidemic? Using Truven MarketScan claim data collected from 2001 to 2013 have shown significant promise of our proposed framework's efficacy.
△ Less
Submitted 12 April, 2022; v1 submitted 15 May, 2021;
originally announced May 2021.
-
Knowledge Transfer across Imaging Modalities Via Simultaneous Learning of Adaptive Autoencoders for High-Fidelity Mobile Robot Vision
Authors:
Md Mahmudur Rahman,
Tauhidur Rahman,
Donghyun Kim,
Mohammad Arif Ul Alam
Abstract:
Enabling mobile robots for solving challenging and diverse shape, texture, and motion related tasks with high fidelity vision requires the integration of novel multimodal imaging sensors and advanced fusion techniques. However, it is associated with high cost, power, hardware modification, and computing requirements which limit its scalability. In this paper, we propose a novel Simultaneously Lear…
▽ More
Enabling mobile robots for solving challenging and diverse shape, texture, and motion related tasks with high fidelity vision requires the integration of novel multimodal imaging sensors and advanced fusion techniques. However, it is associated with high cost, power, hardware modification, and computing requirements which limit its scalability. In this paper, we propose a novel Simultaneously Learned Auto Encoder Domain Adaptation (SAEDA)-based transfer learning technique to empower noisy sensing with advanced sensor suite capabilities. In this regard, SAEDA trains both source and target auto-encoders together on a single graph to obtain the domain invariant feature space between the source and target domains on simultaneously collected data. Then, it uses the domain invariant feature space to transfer knowledge between different signal modalities. The evaluation has been done on two collected datasets (LiDAR and Radar) and one existing dataset (LiDAR, Radar and Video) which provides a significant improvement in quadruped robot-based classification (home floor and human activity recognition) and regression (surface roughness estimation) problems. We also integrate our sensor suite and SAEDA framework on two real-time systems (vacuum cleaning and Mini-Cheetah quadruped robots) for studying the feasibility and usability.
△ Less
Submitted 2 September, 2021; v1 submitted 11 May, 2021;
originally announced May 2021.
-
Activity-Aware Deep Cognitive Fatigue Assessment using Wearables
Authors:
Mohammad Arif Ul Alam
Abstract:
Cognitive fatigue has been a common problem among workers which has become an increasing global problem since the emergence of COVID-19 as a global pandemic. While existing multi-modal wearable sensors-aided automatic cognitive fatigue monitoring tools have focused on physical and physiological sensors (ECG, PPG, Actigraphy) analytic on specific group of people (say gamers, athletes, construction…
▽ More
Cognitive fatigue has been a common problem among workers which has become an increasing global problem since the emergence of COVID-19 as a global pandemic. While existing multi-modal wearable sensors-aided automatic cognitive fatigue monitoring tools have focused on physical and physiological sensors (ECG, PPG, Actigraphy) analytic on specific group of people (say gamers, athletes, construction workers), activity-awareness is utmost importance due to its different responses on physiology in different person. In this paper, we propose a novel framework, Activity-Aware Recurrent Neural Network (\emph{AcRoNN}), that can generalize individual activity recognition and improve cognitive fatigue estimation significantly. We evaluate and compare our proposed method with state-of-art methods using one real-time collected dataset from 5 individuals and another publicly available dataset from 27 individuals achieving max. 19% improvement.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Monitoring My Dehydration: A Non-Invasive Dehydration Alert System Using Electrodermal Activity
Authors:
Nandan Kulkarni,
Christopher Compton,
Jooseppi Luna,
Mohammad Arif Ul Alam
Abstract:
Staying hydrated and drinking fluids is extremely crucial to stay healthy and maintaining even basic bodily functions. Studies have shown that dehydration leads to loss of productivity, cognitive impairment and mood in both men and women. However, there are no such an existing tool that can monitor dehydration continuously and provide alert to users before it affects on their health. In this paper…
▽ More
Staying hydrated and drinking fluids is extremely crucial to stay healthy and maintaining even basic bodily functions. Studies have shown that dehydration leads to loss of productivity, cognitive impairment and mood in both men and women. However, there are no such an existing tool that can monitor dehydration continuously and provide alert to users before it affects on their health. In this paper, we propose to utilize wearable Electrodermal Activity (EDA) sensors in conjunction with signal processing and machine learning techniques to develop first time ever a dehydration self-monitoring tool, \emph{Monitoring My Dehydration} (MMD), that can instantly detect the hydration level of human skin. Moreover, we develop an Android application over Bluetooth to connect with wearable EDA sensor integrated wristband to track hydration levels of the users real-time and instantly alert to the users when the hydration level goes beyond the danger level. To validate our developed tool's performance, we recruit 5 users, carefully designed the water intake routines to annotate the dehydration ground truth and trained state-of-art machine learning models to predict instant hydration level i.e., well-hydrated, hydrated, dehydrated and very dehydrated. Our system provides an accuracy of 84.5% in estimating dehydration level with an sensitivity of 87.5% and a specificity of 90.3% which provides us confidence of moving forward with our method for larger longitudinal study.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
AutoCogniSys: IoT Assisted Context-Aware Automatic Cognitive Health Assessment
Authors:
Mohammad Arif Ul Alam,
Nirmalya Roy,
Sarah Holmes,
Aryya Gangopadhyay,
Elizabeth Galik
Abstract:
Cognitive impairment has become epidemic in older adult population. The recent advent of tiny wearable and ambient devices, a.k.a Internet of Things (IoT) provides ample platforms for continuous functional and cognitive health assessment of older adults. In this paper, we design, implement and evaluate AutoCogniSys, a context-aware automated cognitive health assessment system, combining the sensin…
▽ More
Cognitive impairment has become epidemic in older adult population. The recent advent of tiny wearable and ambient devices, a.k.a Internet of Things (IoT) provides ample platforms for continuous functional and cognitive health assessment of older adults. In this paper, we design, implement and evaluate AutoCogniSys, a context-aware automated cognitive health assessment system, combining the sensing powers of wearable physiological (Electrodermal Activity, Photoplethysmography) and physical (Accelerometer, Object) sensors in conjunction with ambient sensors. We design appropriate signal processing and machine learning techniques, and develop an automatic cognitive health assessment system in a natural older adults living environment. We validate our approaches using two datasets: (i) a naturalistic sensor data streams related to Activities of Daily Living and mental arousal of 22 older adults recruited in a retirement community center, individually living in their own apartments using a customized inexpensive IoT system (IRB #HP-00064387) and (ii) a publicly available dataset for emotion detection. The performance of AutoCogniSys attests max. 93\% of accuracy in assessing cognitive health of older adults.
△ Less
Submitted 16 March, 2020;
originally announced March 2020.
-
LAXARY: A Trustworthy Explainable Twitter Analysis Model for Post-Traumatic Stress Disorder Assessment
Authors:
Mohammad Arif Ul Alam,
Dhawal Kapadia
Abstract:
Veteran mental health is a significant national problem as large number of veterans are returning from the recent war in Iraq and continued military presence in Afghanistan. While significant existing works have investigated twitter posts-based Post Traumatic Stress Disorder (PTSD) assessment using blackbox machine learning techniques, these frameworks cannot be trusted by the clinicians due to th…
▽ More
Veteran mental health is a significant national problem as large number of veterans are returning from the recent war in Iraq and continued military presence in Afghanistan. While significant existing works have investigated twitter posts-based Post Traumatic Stress Disorder (PTSD) assessment using blackbox machine learning techniques, these frameworks cannot be trusted by the clinicians due to the lack of clinical explainability. To obtain the trust of clinicians, we explore the big question, can twitter posts provide enough information to fill up clinical PTSD assessment surveys that have been traditionally trusted by clinicians? To answer the above question, we propose, LAXARY (Linguistic Analysis-based Exaplainable Inquiry) model, a novel Explainable Artificial Intelligent (XAI) model to detect and represent PTSD assessment of twitter users using a modified Linguistic Inquiry and Word Count (LIWC) analysis. First, we employ clinically validated survey tools for collecting clinical PTSD assessment data from real twitter users and develop a PTSD Linguistic Dictionary using the PTSD assessment survey results. Then, we use the PTSD Linguistic Dictionary along with machine learning model to fill up the survey tools towards detecting PTSD status and its intensity of corresponding twitter users. Our experimental evaluation on 210 clinically validated veteran twitter users provides promising accuracies of both PTSD classification and its intensity estimation. We also evaluate our developed PTSD Linguistic Dictionary's reliability and validity.
△ Less
Submitted 20 July, 2020; v1 submitted 16 March, 2020;
originally announced March 2020.
-
Reflecting After Learning for Understanding
Authors:
Lee Martie,
Mohammad Arif Ul Alam,
Gaoyuan Zhang,
Ryan R. Anderson
Abstract:
Today, image classification is a common way for systems to process visual content. Although neural network approaches to classification have seen great progress in reducing error rates, it is not clear what this means for a cognitive system that needs to make sense of the multiple and competing predictions from its own classifiers. As a step to address this, we present a novel framework that uses…
▽ More
Today, image classification is a common way for systems to process visual content. Although neural network approaches to classification have seen great progress in reducing error rates, it is not clear what this means for a cognitive system that needs to make sense of the multiple and competing predictions from its own classifiers. As a step to address this, we present a novel framework that uses meta-reasoning and meta-operations to unify predictions into abstractions, properties, or relationships. Using the framework on images from ImageNet, we demonstrate systems that unify 41% to 46% of predictions in general and unify 67% to 75% of predictions when the systems can explain their conceptual differences. We also demonstrate a system in "the wild" by feeding live video images through it and show it unifying 51% of predictions in general and 69% of predictions when their differences can be explained conceptually by the system. In a survey given to 24 participants, we found that 87% of the unified predictions describe their corresponding images.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
An Efficient Genetic Algorithm for Discovering Diverse-Frequent Patterns
Authors:
Shanjida Khatun,
Hasib Ul Alam,
Swakkhar Shatabda
Abstract:
Working with exhaustive search on large dataset is infeasible for several reasons. Recently, developed techniques that made pattern set mining feasible by a general solver with long execution time that supports heuristic search and are limited to small datasets only. In this paper, we investigate an approach which aims to find diverse set of patterns using genetic algorithm to mine diverse frequen…
▽ More
Working with exhaustive search on large dataset is infeasible for several reasons. Recently, developed techniques that made pattern set mining feasible by a general solver with long execution time that supports heuristic search and are limited to small datasets only. In this paper, we investigate an approach which aims to find diverse set of patterns using genetic algorithm to mine diverse frequent patterns. We propose a fast heuristic search algorithm that outperforms state-of-the-art methods on a standard set of benchmarks and capable to produce satisfactory results within a short period of time. Our proposed algorithm uses a relative encoding scheme for the patterns and an effective twin removal technique to ensure diversity throughout the search.
△ Less
Submitted 19 July, 2015;
originally announced July 2015.
-
Heuristic based task scheduling in multiprocessor systems with genetic algorithm by choosing the eligible processor
Authors:
Probir Roy,
Md. Mejbah Ul Alam,
Nishita Das
Abstract:
In multiprocessor systems, one of the main factors of systems' performance is task scheduling. The well the task be distributed among the processors the well be the performance. Again finding the optimal solution of scheduling the tasks into the processors is NP-complete, that is, it will take a lot of time to find the optimal solution. Many evolutionary algorithms (e.g. Genetic Algorithm, Simulat…
▽ More
In multiprocessor systems, one of the main factors of systems' performance is task scheduling. The well the task be distributed among the processors the well be the performance. Again finding the optimal solution of scheduling the tasks into the processors is NP-complete, that is, it will take a lot of time to find the optimal solution. Many evolutionary algorithms (e.g. Genetic Algorithm, Simulated annealing) are used to reach the near optimal solution in linear time. In this paper we propose a heuristic for genetic algorithm based task scheduling in multiprocessor systems by choosing the eligible processor on educated guess. From comparison it is found that this new heuristic based GA takes less computation time to reach the suboptimal solution.
△ Less
Submitted 9 August, 2012;
originally announced August 2012.
-
A more appropriate Protein Classification using Data Mining
Authors:
Muhammad Mahbubur Rahman,
Arif Ul Alam,
Abdullah-Al-Mamun,
Tamnun E Mursalin
Abstract:
Research in bioinformatics is a complex phenomenon as it overlaps two knowledge domains, namely, biological and computer sciences. This paper has tried to introduce an efficient data mining approach for classifying proteins into some useful groups by representing them in hierarchy tree structure. There are several techniques used to classify proteins but most of them had few drawbacks on their gro…
▽ More
Research in bioinformatics is a complex phenomenon as it overlaps two knowledge domains, namely, biological and computer sciences. This paper has tried to introduce an efficient data mining approach for classifying proteins into some useful groups by representing them in hierarchy tree structure. There are several techniques used to classify proteins but most of them had few drawbacks on their grouping. Among them the most efficient grouping technique is used by PSIMAP. Even though PSIMAP (Protein Structural Interactome Map) technique was successful to incorporate most of the protein but it fails to classify the scale free property proteins. Our technique overcomes this drawback and successfully maps all the protein in different groups, including the scale free property proteins failed to group by PSIMAP. Our approach selects the six major attributes of protein: a) Structure comparison b) Sequence Comparison c) Connectivity d) Cluster Index e) Interactivity f) Taxonomic to group the protein from the databank by generating a hierarchal tree structure. The proposed approach calculates the degree (probability) of similarity of each protein newly entered in the system against of existing proteins in the system by using probability theorem on each six properties of proteins.
△ Less
Submitted 12 October, 2011;
originally announced November 2011.