-
Development of a WAZOBIA-Named Entity Recognition System
Authors:
S. E Emedem,
I. E Onyenwe,
E. G Onyedinma
Abstract:
Named Entity Recognition NER is very crucial for various natural language processing applications, including information extraction, machine translation, and sentiment analysis. Despite the ever-increasing interest in African languages within computational linguistics, existing NER systems focus mainly on English, European, and a few other global languages, leaving a significant gap for under-reso…
▽ More
Named Entity Recognition NER is very crucial for various natural language processing applications, including information extraction, machine translation, and sentiment analysis. Despite the ever-increasing interest in African languages within computational linguistics, existing NER systems focus mainly on English, European, and a few other global languages, leaving a significant gap for under-resourced languages. This research presents the development of a WAZOBIA-NER system tailored for the three most prominent Nigerian languages: Hausa, Yoruba, and Igbo. This research begins with a comprehensive compilation of annotated datasets for each language, addressing data scarcity and linguistic diversity challenges. Exploring the state-of-the-art machine learning technique, Conditional Random Fields (CRF) and deep learning models such as Bidirectional Long Short-Term Memory (BiLSTM), Bidirectional Encoder Representation from Transformers (Bert) and fine-tune with a Recurrent Neural Network (RNN), the study evaluates the effectiveness of these approaches in recognizing three entities: persons, organizations, and locations. The system utilizes optical character recognition (OCR) technology to convert textual images into machine-readable text, thereby enabling the Wazobia system to accept both input text and textual images for extraction purposes. The system achieved a performance of 0.9511 in precision, 0.9400 in recall, 0.9564 in F1-score, and 0.9301 in accuracy. The model's evaluation was conducted across three languages, with precision, recall, F1-score, and accuracy as key assessment metrics. The Wazobia-NER system demonstrates that it is feasible to build robust NER tools for under-resourced African languages using current NLP frameworks and transfer learning.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Integration of LaTeX formula in computer-based test application for academic purposes
Authors:
Ikechukwu E. Onyenwe,
Ebele Onyedinma,
Onyedika O. Ikechukwu-Onyenwe,
Obinna Agbata,
Faustinah N. Tubo
Abstract:
LaTeX is a free document preparation system that handles the typesetting of mathematical expressions smoothly and elegantly. It has become the standard format for creating and publishing research articles in mathematics and many scientific fields. Computer-based testing (CBT) has become widespread in recent years. Most establishments now use it to deliver assessments as an alternative to using the…
▽ More
LaTeX is a free document preparation system that handles the typesetting of mathematical expressions smoothly and elegantly. It has become the standard format for creating and publishing research articles in mathematics and many scientific fields. Computer-based testing (CBT) has become widespread in recent years. Most establishments now use it to deliver assessments as an alternative to using the pen-paper method. To deliver an assessment, the examiner would first add a new exam or edit an existing exam using a CBT editor. Thus, the implementation of CBT should comprise both support for setting and administering questions. Existing CBT applications used in the academic space lacks the capacity to handle advanced formulas, programming codes, and tables, thereby resorting to converting them into images which takes a lot of time and storage space. In this paper, we discuss how we solvde this problem by integrating latex technology into our CBT applications. This enables seamless manipulation and accurate rendering of tables, programming codes, and equations to increase readability and clarity on both the setting and administering of questions platforms. Furthermore, this implementation has reduced drastically the sizes of system resources allocated to converting tables, codes, and equations to images. Those in mathematics, statistics, computer science, engineering, chemistry, etc. will find this application useful.
△ Less
Submitted 13 January, 2024;
originally announced February 2024.
-
Development of an NLP-driven computer-based test guide for visually impaired students
Authors:
Tubo Faustinah Nemieboka,
Ikechukwu E. Onyenwe,
Doris C. Asogwa
Abstract:
In recent years, advancements in Natural Language Processing (NLP) techniques have revolutionized the field of accessibility and exclusivity of testing, particularly for visually impaired students (VIS). CBT has shown in years back its relevance in terms of administering exams electronically, making the test process easier, providing quicker and more accurate results, and offering greater flexibil…
▽ More
In recent years, advancements in Natural Language Processing (NLP) techniques have revolutionized the field of accessibility and exclusivity of testing, particularly for visually impaired students (VIS). CBT has shown in years back its relevance in terms of administering exams electronically, making the test process easier, providing quicker and more accurate results, and offering greater flexibility and accessibility for candidates. Yet, its relevance was not felt by the visually impaired students as they cannot access printed documents. Hence, in this paper, we present an NLP-driven Computer-Based Test guide for visually impaired students. It employs a speech technology pre-trained methods to provide real-time assistance and support to visually impaired students. The system utilizes NLP technologies to convert the text-based questions and the associated options in a machine-readable format. Subsequently, the speech technology pre-trained model processes the converted text enabling the VIS to comprehend and analyze the content. Furthermore, we validated that this pre-trained model is not perverse by testing for accuracy using sample audio datasets labels (A, B, C, D, E, F, G) to compare with the voice recordings obtained from 20 VIS which is been predicted by the system to attain values for precision, recall, and F1-scores. These metrics are used to assess the performance of the pre-trained model and have indicated that it is proficient enough to give its better performance to the evaluated system. The methodology adopted for this system is Object Oriented Analysis and Design Methodology (OOADM) where Objects are discussed and built by modeling real-world instances.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Image Restoration: A Comparative Analysis of Image De noising Using Different Spatial Filtering Techniques
Authors:
E. G. Onyedinma,
I. E. Onyenwe
Abstract:
Acquired images for medical and other purposes can be affected by noise from both the equipment used in the capturing or the environment. This can have adverse effect on the information therein. Thus, the need to restore the image to its original state by removing the noise. To effectively remove such noise, pre knowledge of the type of noise model present is necessary. This work explores differen…
▽ More
Acquired images for medical and other purposes can be affected by noise from both the equipment used in the capturing or the environment. This can have adverse effect on the information therein. Thus, the need to restore the image to its original state by removing the noise. To effectively remove such noise, pre knowledge of the type of noise model present is necessary. This work explores different noise removal filters by first introducing noise to an image and then applying different spatial domain filtering techniques to the image to get rid of the noise. Different evaluation techniques such as Peak to Signal Noise Ratio(PSNR) and Root Mean Square Error(RMSE) were adopted to determine how effective each filter is on a given image noise. Result showed that some filters are more effective on some noise models than others.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Development of an intelligent system for the detection of corona virus using artificial neural network
Authors:
Nwafor Emmanuel O,
Ngozi Maryrose Umeh,
Ikechukwu Ekene Onyenwe
Abstract:
This paper presents the development of an intelligent system for the detection of coronavirus using artificial neural network. This was done after series of literature review which indicated that high fever accounts for 87.9% of the COVID-19 symptoms. 683 temperature data of COVID-19 patients at >= 38C^o were collected from Colliery hospital Enugu, Nigeria and used to train an artificial neural ne…
▽ More
This paper presents the development of an intelligent system for the detection of coronavirus using artificial neural network. This was done after series of literature review which indicated that high fever accounts for 87.9% of the COVID-19 symptoms. 683 temperature data of COVID-19 patients at >= 38C^o were collected from Colliery hospital Enugu, Nigeria and used to train an artificial neural network detective model for the detection of COVID-19. The reference model generated was used converted into Verilog codes using Hardware Description Language (HDL) and then burn into a Field Programming Gate Array (FPGA) controller using FPGA tool in Matlab. The performance of the model when evaluated using confusion matrix, regression and means square error (MSE) showed that the regression value is 0.967; the accuracy is 97% and then MSE is 0.00100Mu. These results all implied that the new detection system for is reliable and very effective for the detection of COVID-19.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages
Authors:
Cheikh M. Bamba Dione,
David Adelani,
Peter Nabende,
Jesujoba Alabi,
Thapelo Sindane,
Happy Buzaaba,
Shamsuddeen Hassan Muhammad,
Chris Chinenye Emezue,
Perez Ogayo,
Anuoluwapo Aremu,
Catherine Gitau,
Derguene Mbaye,
Jonathan Mukiibi,
Blessing Sibanda,
Bonaventure F. P. Dossou,
Andiswa Bukula,
Rooweither Mabuya,
Allahsera Auguste Tapo,
Edwin Munkoh-Buabeng,
victoire Memdjokam Koagne,
Fatoumata Ouoba Kabore,
Amelia Taylor,
Godson Kalipe,
Tebogo Macucwa,
Vukosi Marivate
, et al. (19 additional authors not shown)
Abstract:
In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-l…
▽ More
In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Developing Smart Web-Search Using RegEx
Authors:
Ikechukwu Onyenwe,
Stanley Ogbonna,
Ebele Onyedimma,
Onyedikachukwu Ikechukwu-Onyenwe,
Chidinma Nwafor
Abstract:
Due to the increasing storage data on Web Applications, it becomes very difficult to use only keyword-based searches to provide comprehensive search results, thus increasing the difficulty for web users to search information on the web. In this paper, we proposed using a combined method of keyword-based and Regular expressions (regEx) searches to perform a search using strings of targeted items fo…
▽ More
Due to the increasing storage data on Web Applications, it becomes very difficult to use only keyword-based searches to provide comprehensive search results, thus increasing the difficulty for web users to search information on the web. In this paper, we proposed using a combined method of keyword-based and Regular expressions (regEx) searches to perform a search using strings of targeted items for optimal results even as the volume of data around the world on the Internet continues to explode. The idea is to embed regEx patterns as part of the search engine's algorithm in a web application project to provide strings related to the targeted items for more comprehensive coverage of search results. The user's search query is a string of characters guided by search boundaries selected from the entry point. The results returned from the search operation are different results within a category determined by the search boundaries. This is designed to be beneficial to a user who has an obscure idea about the information he/she wanted to search but knows the boundaries within which to get the information. This technique can be applied to data processing tasks such as information extraction and search refinement.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
Developing Products Update-Alert System for e-Commerce Websites Users Using HTML Data and Web Scraping Technique
Authors:
Ikechukwu Onyenwe,
Ebele Onyedinma,
Chidinma Nwafor,
Obinna Agbata
Abstract:
Websites are regarded as domains of limitless information which anyone and everyone can access. The new trend of technology put us to change the way we are doing our business. The Internet now is fastly becoming a new place for business and the advancement in this technology gave rise to the number of e-commerce websites. This made the lifestyle of marketers/vendors, retailers and consumers (colle…
▽ More
Websites are regarded as domains of limitless information which anyone and everyone can access. The new trend of technology put us to change the way we are doing our business. The Internet now is fastly becoming a new place for business and the advancement in this technology gave rise to the number of e-commerce websites. This made the lifestyle of marketers/vendors, retailers and consumers (collectively regarded as users in this paper) easy, because it provides easy platforms to sale/order items through the internet. This also requires that the users will have to spend a lot of time and effort to search for the best product deals, products updates and offers on e-commerce websites. They have to filter and compare search results by themselves which takes a lot of time and there are chances of ambiguous results. In this paper, we applied web crawling and scraping methods on an e-commerce website to get HTML data for identifying products updates based on the current time. The HTML data is preprocessed to extract details of the products such as name, price, post date and time, etc. to serve as useful information for users.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Text Classification Using Hybrid Machine Learning Algorithms on Big Data
Authors:
D. C. Asogwa,
S. O. Anigbogu,
I. E. Onyenwe,
F. A. Sani
Abstract:
Recently, there are unprecedented data growth originating from different online platforms which contribute to big data in terms of volume, velocity, variety and veracity (4Vs). Given this nature of big data which is unstructured, performing analytics to extract meaningful information is currently a great challenge to big data analytics. Collecting and analyzing unstructured textual data allows dec…
▽ More
Recently, there are unprecedented data growth originating from different online platforms which contribute to big data in terms of volume, velocity, variety and veracity (4Vs). Given this nature of big data which is unstructured, performing analytics to extract meaningful information is currently a great challenge to big data analytics. Collecting and analyzing unstructured textual data allows decision makers to study the escalation of comments/posts on our social media platforms. Hence, there is need for automatic big data analysis to overcome the noise and the non-reliability of these unstructured dataset from the digital media platforms. However, current machine learning algorithms used are performance driven focusing on the classification/prediction accuracy based on known properties learned from the training samples. With the learning task in a large dataset, most machine learning models are known to require high computational cost which eventually leads to computational complexity. In this work, two supervised machine learning algorithms are combined with text mining techniques to produce a hybrid model which consists of Naïve Bayes and support vector machines (SVM). This is to increase the efficiency and accuracy of the results obtained and also to reduce the computational cost and complexity. The system also provides an open platform where a group of persons with a common interest can share their comments/messages and these comments classified automatically as legal or illegal. This improves the quality of conversation among users. The hybrid model was developed using WEKA tools and Java programming language. The result shows that the hybrid model gave 96.76% accuracy as against the 61.45% and 69.21% of the Naïve Bayes and SVM models respectively.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
An Automated Multiple-Choice Question Generation Using Natural Language Processing Techniques
Authors:
Chidinma A. Nwafor,
Ikechukwu E. Onyenwe
Abstract:
Automatic multiple-choice question generation (MCQG) is a useful yet challenging task in Natural Language Processing (NLP). It is the task of automatic generation of correct and relevant questions from textual data. Despite its usefulness, manually creating sizeable, meaningful and relevant questions is a time-consuming and challenging task for teachers. In this paper, we present an NLP-based syst…
▽ More
Automatic multiple-choice question generation (MCQG) is a useful yet challenging task in Natural Language Processing (NLP). It is the task of automatic generation of correct and relevant questions from textual data. Despite its usefulness, manually creating sizeable, meaningful and relevant questions is a time-consuming and challenging task for teachers. In this paper, we present an NLP-based system for automatic MCQG for Computer-Based Testing Examination (CBTE).We used NLP technique to extract keywords that are important words in a given lesson material. To validate that the system is not perverse, five lesson materials were used to check the effectiveness and efficiency of the system. The manually extracted keywords by the teacher were compared to the auto-generated keywords and the result shows that the system was capable of extracting keywords from lesson materials in setting examinable questions. This outcome is presented in a user-friendly interface for easy accessibility.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
The impact of political party/candidate on the election results from a sentiment analysis perspective using #AnambraDecides2017 tweets
Authors:
Ikechukwu Onyenwe,
Samuel Nwagbo,
Njideka Mbeledogu,
Ebele Onyedinma
Abstract:
This work investigates empirically the impact of political party control over its candidates or vice versa on winning an election using a natural language processing technique called sentiment analysis (SA). To do this, a set of 7430 tweets bearing or related to #AnambraDecides2017 was streamed during the November 18, 2017, Anambra State gubernatorial election. These are Twitter discussions on the…
▽ More
This work investigates empirically the impact of political party control over its candidates or vice versa on winning an election using a natural language processing technique called sentiment analysis (SA). To do this, a set of 7430 tweets bearing or related to #AnambraDecides2017 was streamed during the November 18, 2017, Anambra State gubernatorial election. These are Twitter discussions on the top five political parties and their candidates termed political actors in this paper. We conduct polarity and subjectivity sentiment analyses on all the tweets considering time as a useful dimension of SA. Furthermore, we use the word frequency to find words most associated with the political actors in a given time. We find most talked about topics using a topic modeling algorithm and how the computed sentiments and most frequent words are related to the topics per political actor. Among other things, we deduced from the experimental results that even though a political party serves as a platform that sales the personality of a candidate, the acceptance of the candidate/party adds to the winning of an election. For example, we found the winner of the election Willie Obiano benefiting from the values his party share among the people of the State. Associating his name with his party, All Progressive Grand Alliance (APGA) displays more positive sentiments and the subjective sentiment analysis indicates that Twitter users mentioning APGA are less emotionally subjective in their tweets than the other parties.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Igbo-English Machine Translation: An Evaluation Benchmark
Authors:
Ignatius Ezeani,
Paul Rayson,
Ikechukwu Onyenwe,
Chinedu Uchechukwu,
Mark Hepple
Abstract:
Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging. A lot of focus on well resourced languages such as English, Japanese, German, French, Russian, Mandarin Chinese etc. Over 97% of the world's 7000 languages, including African languages, are low resourced for NLP i.e. they have little or no…
▽ More
Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging. A lot of focus on well resourced languages such as English, Japanese, German, French, Russian, Mandarin Chinese etc. Over 97% of the world's 7000 languages, including African languages, are low resourced for NLP i.e. they have little or no data, tools, and techniques for NLP research. For instance, only 5 out of 2965, 0.19% authors of full text papers in the ACL Anthology extracted from the 5 major conferences in 2018 ACL, NAACL, EMNLP, COLING and CoNLL, are affiliated to African institutions. In this work, we discuss our effort toward building a standard machine translation benchmark dataset for Igbo, one of the 3 major Nigerian languages. Igbo is spoken by more than 50 million people globally with over 50% of the speakers are in southeastern Nigeria. Igbo is low resourced although there have been some efforts toward developing IgboNLP such as part of speech tagging and diacritic restoration
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Performance Evaluation of Histogram Equalization and Fuzzy image Enhancement Techniques on Low Contrast Images
Authors:
E Onyedinma,
I Onyenwe,
H Inyiama
Abstract:
Image enhancement aims at improving the information content of original image for a specific purpose. This purpose could be for visual interpretation or for effective extraction of required details. Nevertheless, some acquired images are often associated with pixels of low dynamic range and as such result in low contrast images. Enhancing the contrast therefore tends to increase the dynamic range…
▽ More
Image enhancement aims at improving the information content of original image for a specific purpose. This purpose could be for visual interpretation or for effective extraction of required details. Nevertheless, some acquired images are often associated with pixels of low dynamic range and as such result in low contrast images. Enhancing the contrast therefore tends to increase the dynamic range of the gray levels in the acquired image so as to span the full intensity range. Techniques such as Histogram Equalization (HE) and fuzzy technique can be adopted for contrast enhancement. HE adjusts the contrast of an input image by modifying the intensity distribution of its histogram. It is characterized by providing a global approach to image enhancement, computationally fast and easy to implement approach but can introduce unnatural artifacts and other undesirable elements to the resulting image. Fuzzy technique on its part enhances image by mapping the image gray level intensities into a fuzzy plane using membership functions; modifying the membership functions as desired and mapping back into the gray level plane. Thus, details at desired areas can be enhanced at the expense of increase in computational cost. This paper explores the effect of the use of HE and fuzzy technique to enhance low contrast images. Their performances are evaluated using the Mean squared error (MSE), Peak to signal noise ratio (PSNR), entropy and Absolute mean brightness error (AMBE).
△ Less
Submitted 1 September, 2019;
originally announced September 2019.