-
Investigating Group Distributionally Robust Optimization for Deep Imbalanced Learning: A Case Study of Binary Tabular Data Classification
Authors:
Ismail. B. Mustapha,
Shafaatunnur Hasan,
Hatem S Y Nabbus,
Mohamed Mostafa Ali Montaser,
Sunday Olusanya Olatunji,
Siti Maryam Shamsuddin
Abstract:
One of the most studied machine learning challenges that recent studies have shown the susceptibility of deep neural networks to is the class imbalance problem. While concerted research efforts in this direction have been notable in recent years, findings have shown that the canonical learning objective, empirical risk minimization (ERM), is unable to achieve optimal imbalance learning in deep neu…
▽ More
One of the most studied machine learning challenges that recent studies have shown the susceptibility of deep neural networks to is the class imbalance problem. While concerted research efforts in this direction have been notable in recent years, findings have shown that the canonical learning objective, empirical risk minimization (ERM), is unable to achieve optimal imbalance learning in deep neural networks given its bias to the majority class. An alternative learning objective, group distributionally robust optimization (gDRO), is investigated in this study for imbalance learning, focusing on tabular imbalanced data as against image data that has dominated deep imbalance learning research. Contrary to minimizing average per instance loss as in ERM, gDRO seeks to minimize the worst group loss over the training data. Experimental findings in comparison with ERM and classical imbalance methods using four popularly used evaluation metrics in imbalance learning across several benchmark imbalance binary tabular data of varying imbalance ratios reveal impressive performance of gDRO, outperforming other compared methods in terms of g-mean and roc-auc.
△ Less
Submitted 4 March, 2023;
originally announced March 2023.
-
Effective Email Spam Detection System using Extreme Gradient Boosting
Authors:
Ismail B. Mustapha,
Shafaatunnur Hasan,
Sunday O. Olatunji,
Siti Mariyam Shamsuddin,
Afolabi Kazeem
Abstract:
The popularity, cost-effectiveness and ease of information exchange that electronic mails offer to electronic device users has been plagued with the rising number of unsolicited or spam emails. Driven by the need to protect email users from this growing menace, research in spam email filtering/detection systems has being increasingly active in the last decade. However, the adaptive nature of spam…
▽ More
The popularity, cost-effectiveness and ease of information exchange that electronic mails offer to electronic device users has been plagued with the rising number of unsolicited or spam emails. Driven by the need to protect email users from this growing menace, research in spam email filtering/detection systems has being increasingly active in the last decade. However, the adaptive nature of spam emails has often rendered most of these systems ineffective. While several spam detection models have been reported in literature, the reported performance on an out of sample test data shows the room for more improvement. Presented in this research is an improved spam detection model based on Extreme Gradient Boosting (XGBoost) which to the best of our knowledge has received little attention spam email detection problems. Experimental results show that the proposed model outperforms earlier approaches across a wide range of evaluation metrics. A thorough analysis of the model results in comparison to the results of earlier works is also presented.
△ Less
Submitted 27 December, 2020;
originally announced December 2020.
-
A review on handwritten character and numeral recognition for Roman, Arabic, Chinese and Indian scripts
Authors:
Aini Najwa Azmi,
Dewi Nasien,
Siti Mariyam Shamsuddin
Abstract:
There are a lot of intensive researches on handwritten character recognition (HCR) for almost past four decades. The research has been done on some of popular scripts such as Roman, Arabic, Chinese and Indian. In this paper we present a review on HCR work on the four popular scripts. We have summarized most of the published paper from 2005 to recent and also analyzed the various methods in creatin…
▽ More
There are a lot of intensive researches on handwritten character recognition (HCR) for almost past four decades. The research has been done on some of popular scripts such as Roman, Arabic, Chinese and Indian. In this paper we present a review on HCR work on the four popular scripts. We have summarized most of the published paper from 2005 to recent and also analyzed the various methods in creating a robust HCR system. We also added some future direction of research on HCR.
△ Less
Submitted 22 August, 2013;
originally announced August 2013.