Search | arXiv e-print repository

Big Data Architecture for Large Organizations

Authors: Fathima Nuzla Ismail, Abira Sengupta, Shanika Amarasoma

Abstract: The exponential growth of big data has transformed how large organisations leverage information to drive innovation, optimise processes, and maintain competitive advantages. However, managing and extracting insights from vast, heterogeneous data sources requires a scalable, secure, and well-integrated big data architecture. This paper proposes a comprehensive big data framework that aligns with or… ▽ More The exponential growth of big data has transformed how large organisations leverage information to drive innovation, optimise processes, and maintain competitive advantages. However, managing and extracting insights from vast, heterogeneous data sources requires a scalable, secure, and well-integrated big data architecture. This paper proposes a comprehensive big data framework that aligns with organisational objectives while ensuring flexibility, scalability, and governance. The architecture encompasses multiple layers, including data ingestion, transformation, storage, analytics, machine learning, and security, incorporating emerging technologies such as Generative AI (GenAI) and low-code machine learning. Cloud-based implementations across Google Cloud, AWS, and Microsoft Azure are analysed, highlighting their tools and capabilities. Additionally, this study explores advancements in big data architecture, including AI-driven automation, data mesh, and Data Ocean paradigms. By establishing a structured, adaptable framework, this research provides a foundational blueprint for large organisations to harness big data as a strategic asset effectively. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2501.11094 [pdf, other]

Enhanced Suicidal Ideation Detection from Social Media Using a CNN-BiLSTM Hybrid Model

Authors: Mohaiminul Islam Bhuiyan, Nur Shazwani Kamarudin, Nur Hafieza Ismail

Abstract: Suicidal ideation detection is crucial for preventing suicides, a leading cause of death worldwide. Many individuals express suicidal thoughts on social media, offering a vital opportunity for early detection through advanced machine learning techniques. The identification of suicidal ideation in social media text is improved by utilising a hybrid framework that integrates Convolutional Neural Net… ▽ More Suicidal ideation detection is crucial for preventing suicides, a leading cause of death worldwide. Many individuals express suicidal thoughts on social media, offering a vital opportunity for early detection through advanced machine learning techniques. The identification of suicidal ideation in social media text is improved by utilising a hybrid framework that integrates Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM), enhanced with an attention mechanism. To enhance the interpretability of the model's predictions, Explainable AI (XAI) methods are applied, with a particular focus on SHapley Additive exPlanations (SHAP), are incorporated. At first, the model managed to reach an accuracy of 92.81%. By applying fine-tuning and early stopping techniques, the accuracy improved to 94.29%. The SHAP analysis revealed key features influencing the model's predictions, such as terms related to mental health struggles. This level of transparency boosts the model's credibility while helping mental health professionals understand and trust the predictions. This work highlights the potential for improving the accuracy and interpretability of detecting suicidal tendencies, making a valuable contribution to the progress of mental health monitoring systems. It emphasizes the significance of blending powerful machine learning methods with explainability to develop reliable and impactful mental health solutions. △ Less

Submitted 19 January, 2025; originally announced January 2025.

arXiv:2501.09309 [pdf]

doi 10.14569/IJACSA.2024.0151133

Understanding Mental Health Content on Social Media and Its Effect Towards Suicidal Ideation

Authors: Mohaiminul Islam Bhuiyan, Nur Shazwani Kamarudin, Nur Hafieza Ismail

Abstract: This review underscores the critical need for effective strategies to identify and support individuals with suicidal ideation, exploiting technological innovations in ML and DL to further suicide prevention efforts. The study details the application of these technologies in analyzing vast amounts of unstructured social media data to detect linguistic patterns, keywords, phrases, tones, and context… ▽ More This review underscores the critical need for effective strategies to identify and support individuals with suicidal ideation, exploiting technological innovations in ML and DL to further suicide prevention efforts. The study details the application of these technologies in analyzing vast amounts of unstructured social media data to detect linguistic patterns, keywords, phrases, tones, and contextual cues associated with suicidal thoughts. It explores various ML and DL models like SVMs, CNNs, LSTM, neural networks, and their effectiveness in interpreting complex data patterns and emotional nuances within text data. The review discusses the potential of these technologies to serve as a life-saving tool by identifying at-risk individuals through their digital traces. Furthermore, it evaluates the real-world effectiveness, limitations, and ethical considerations of employing these technologies for suicide prevention, stressing the importance of responsible development and usage. The study aims to fill critical knowledge gaps by analyzing recent studies, methodologies, tools, and techniques in this field. It highlights the importance of synthesizing current literature to inform practical tools and suicide prevention efforts, guiding innovation in reliable, ethical systems for early intervention. This research synthesis evaluates the intersection of technology and mental health, advocating for the ethical and responsible application of ML, DL, and NLP to offer life-saving potential worldwide while addressing challenges like generalizability, biases, privacy, and the need for further research to ensure these technologies do not exacerbate existing inequities and harms. △ Less

Submitted 16 January, 2025; originally announced January 2025.

arXiv:2411.06798 [pdf]

LA4SR: illuminating the dark proteome with generative AI

Authors: David R. Nelson, Ashish Kumar Jaiswal, Noha Ismail, Alexandra Mystikou, Kourosh Salehi-Ashtiani

Abstract: AI language models (LMs) show promise for biological sequence analysis. We re-engineered open-source LMs (GPT-2, BLOOM, DistilRoBERTa, ELECTRA, and Mamba, ranging from 70M to 12B parameters) for microbial sequence classification. The models achieved F1 scores up to 95 and operated 16,580x faster and at 2.9x the recall of BLASTP. They effectively classified the algal dark proteome - uncharacterized… ▽ More AI language models (LMs) show promise for biological sequence analysis. We re-engineered open-source LMs (GPT-2, BLOOM, DistilRoBERTa, ELECTRA, and Mamba, ranging from 70M to 12B parameters) for microbial sequence classification. The models achieved F1 scores up to 95 and operated 16,580x faster and at 2.9x the recall of BLASTP. They effectively classified the algal dark proteome - uncharacterized proteins comprising about 65% of total proteins - validated on new data including a new, complete Hi-C/Pacbio Chlamydomonas genome. Larger (>1B) LA4SR models reached high accuracy (F1 > 86) when trained on less than 2% of available data, rapidly achieving strong generalization capacity. High accuracy was achieved when training data had intact or scrambled terminal information, demonstrating robust generalization to incomplete sequences. Finally, we provide custom AI explainability software tools for attributing amino acid patterns to AI generative processes and interpret their outputs in evolutionary and biophysical contexts. △ Less

Submitted 11 December, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

arXiv:2406.10068 [pdf, other]

doi 10.1109/3DV53792.2021.00130

DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving Applications

Authors: Li Li, Khalid N. Ismail, Hubert P. H. Shum, Toby P. Breckon

Abstract: We present DurLAR, a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery, as well as a sample benchmark task using depth estimation for autonomous driving applications. Our driving platform is equipped with a high resolution 128 channel LiDAR, a 2MPix stereo camera, a lux meter and a GNSS/INS system. Ambient and reflectivity images are made av… ▽ More We present DurLAR, a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery, as well as a sample benchmark task using depth estimation for autonomous driving applications. Our driving platform is equipped with a high resolution 128 channel LiDAR, a 2MPix stereo camera, a lux meter and a GNSS/INS system. Ambient and reflectivity images are made available along with the LiDAR point clouds to facilitate multi-modal use of concurrent ambient and reflectivity scene information. Leveraging DurLAR, with a resolution exceeding that of prior benchmarks, we consider the task of monocular depth estimation and use this increased availability of higher resolution, yet sparse ground truth scene depth information to propose a novel joint supervised/self-supervised loss formulation. We compare performance over both our new DurLAR dataset, the established KITTI benchmark and the Cityscapes dataset. Our evaluation shows our joint use supervised and self-supervised loss terms, enabled via the superior ground truth resolution and availability within DurLAR improves the quantitative and qualitative performance of leading contemporary monocular depth estimation approaches (RMSE=3.639, Sq Rel=0.936). △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted by 3DV 2021; 13 pages, 14 figures; Dataset at https://github.com/l1997i/durlar

Journal ref: Proc. Int. Conf. on 3D Vision (3DV 2021)

arXiv:2006.03379 [pdf]

doi 10.5121/ijcnc.2020.12302

6RLR-ABC: 6LoWPAN Routing Protocol With Local Repair Using Bio Inspired Artificial Bee Colony

Authors: Nurul Halimatul Asmak Ismail, Samer A. B. Awwad, Rosilah Hassan

Abstract: In recent years, Micro-Electro-Mechanical System (MEMS) has successfully enabled the development of IPv6 over Low power Wireless Personal Area Network (6LoWPAN). This network is equipped with low-cost, low-power, lightweight and varied functions devices. These devices are capable of amassing, storing, processing environmental information and conversing with neighbouring sensors. These requisites p… ▽ More In recent years, Micro-Electro-Mechanical System (MEMS) has successfully enabled the development of IPv6 over Low power Wireless Personal Area Network (6LoWPAN). This network is equipped with low-cost, low-power, lightweight and varied functions devices. These devices are capable of amassing, storing, processing environmental information and conversing with neighbouring sensors. These requisites pose a new and interesting challenge for the development of IEEE 802.15.4 together with routing protocol. In this work, 6LoWPAN Routing Protocol with Local Repair Using Bio Inspired Artificial Bee Colony (6RLR-ABC) has been introduced. This protocol supports connection establishment between nodes in an energy-efficient manner while maintaining high packet delivery ratio and throughput and minimizing average end-to-end delay. This protocol has been evaluated based on increasing generated traffic. The performance of the designed 6RLR-ABC routing protocol has been evaluated compared to 6LoWPAN Ad-hoc On-Demand Distance Vector (LOAD) routing protocol. LOAD protocol has been chosen since it is the most relevant existed 6LoWPANrouting protocol. The simulation results show that the introduced 6RLR-ABC protocol achieves lower packet average end-to-end delay and lower energy consumption compared to LOAD protocol.Additionally,the packet delivery ratio of the designed protocol is much higher than LOAD protocol. The proposed 6RLR-ABC achieved about 39% higher packet delivery ratio and about 54.8% higher throughput while simultaneously offering lower average end-to-end delay and lower average energy consumption than LOAD protocol. △ Less

Submitted 5 June, 2020; originally announced June 2020.

Comments: 19 pages, 12 figures

arXiv:1906.11893 [pdf]

HalalNet: A Deep Neural Network that Classifies the Halalness Slaughtered Chicken from their Images

Authors: A. Elfakharany, R. Yusof, N. Ismail, R. Arfa, M. Yunus

Abstract: Halal requirement in food is important for millions of Muslims worldwide especially for meat and chicken products, insuring that slaughter houses adhere to this requirement is a challenging task to do manually. In this paper a method is proposed that uses a camera that takes images of slaughtered chicken on the conveyor in a slaughter house, the images are then analyzed by a deep neural network to… ▽ More Halal requirement in food is important for millions of Muslims worldwide especially for meat and chicken products, insuring that slaughter houses adhere to this requirement is a challenging task to do manually. In this paper a method is proposed that uses a camera that takes images of slaughtered chicken on the conveyor in a slaughter house, the images are then analyzed by a deep neural network to classify if the image is of a halal slaughtered chicken or not. However, traditional deep learning models require large amounts of data to train on, which in this case these amounts of data were challenging to collect especially the images of non-halal slaughtered chicken, hence this paper shows how the use of one shot learning [1] and transfer learning [2] can reach high accuracy on the few amounts of data that were available. The architecture used is based on the Siamese neural networks architecture which ranks the similarity between two inputs [3] while using the Xception network [4] as the twin networks. We call it HalalNet. This work was done as part of SYCUT (syriah compliant slaughtering system) which is a monitoring system that monitors the halalness of the slaughtered chicken in a slaughter house. The data used to train and validate HalalNet was collected from the Azain slaughtering site (Semenyih, Selangor, Malaysia) containing images of both halal and non-halal slaughtered chicken. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: Submitted in the International Conference on Artificial Intelligence and Robotics for Industrial Applications, AIR2018

Journal ref: International Journal of Integrated Engineering, Vol. 11, no. 4, Sept. 2019, https://publisher.uthm.edu.my/ojs/index.php/ijie/article/view/4194

arXiv:1903.10604 [pdf, other]

An Approach for Adaptive Automatic Threat Recognition Within 3D Computed Tomography Images for Baggage Security Screening

Authors: Qian Wang, Khalid N. Ismail, Toby P. Breckon

Abstract: The screening of baggage using X-ray scanners is now routine in aviation security with automatic threat detection approaches, based on 3D X-ray computed tomography (CT) images, known as Automatic Threat Recognition (ATR) within the aviation security industry. These current strategies use pre-defined threat material signatures in contrast to adaptability towards new and emerging threat signatures.… ▽ More The screening of baggage using X-ray scanners is now routine in aviation security with automatic threat detection approaches, based on 3D X-ray computed tomography (CT) images, known as Automatic Threat Recognition (ATR) within the aviation security industry. These current strategies use pre-defined threat material signatures in contrast to adaptability towards new and emerging threat signatures. To address this issue, the concept of adaptive automatic threat recognition (AATR) was proposed in previous work. In this paper, we present a solution to AATR based on such X-ray CT baggage scan imagery. This aims to address the issues of rapidly evolving threat signatures within the screening requirements. Ideally, the detection algorithms deployed within the security scanners should be readily adaptable to different situations with varying requirements of threat characteristics (e.g., threat material, physical properties of objects). We tackle this issue using a novel adaptive machine learning methodology with our solution consisting of a multi-scale 3D CT image segmentation algorithm, a multi-class support vector machine (SVM) classifier for object material recognition and a strategy to enable the adaptability of our approach. Experiments are conducted on both open and sequestered 3D CT baggage image datasets specifically collected for the AATR study. Our proposed approach performs well on both recognition and adaptation. Overall our approach can achieve the probability of detection around 90% with a probability of false alarm below 20%. Our AATR shows the capabilities of adapting to varying types of materials, even the unknown materials which are not available in the training data, adapting to varying required probability of detection and adapting to varying scales of the threat object. △ Less

Submitted 18 November, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

Comments: Technical Report, Durham University

arXiv:1212.2692 [pdf]

Enhanced skin colour classifier using RGB Ratio model

Authors: Ghazali Osman, Muhammad Suzuri Hitam, Mohd Nasir Ismail

Abstract: Skin colour detection is frequently been used for searching people, face detection, pornographic filtering and hand tracking. The presence of skin or non-skin in digital image can be determined by manipulating pixels colour or pixels texture. The main problem in skin colour detection is to represent the skin colour distribution model that is invariant or least sensitive to changes in illumination… ▽ More Skin colour detection is frequently been used for searching people, face detection, pornographic filtering and hand tracking. The presence of skin or non-skin in digital image can be determined by manipulating pixels colour or pixels texture. The main problem in skin colour detection is to represent the skin colour distribution model that is invariant or least sensitive to changes in illumination condition. Another problem comes from the fact that many objects in the real world may possess almost similar skin-tone colour such as wood, leather, skin-coloured clothing, hair and sand. Moreover, skin colour is different between races and can be different from a person to another, even with people of the same ethnicity. Finally, skin colour will appear a little different when different types of camera are used to capture the object or scene. The objective in this study is to develop a skin colour classifier based on pixel-based using RGB ratio model. The RGB ratio model is a newly proposed method that belongs under the category of an explicitly defined skin region model. This skin classifier was tested with SIdb dataset and two benchmark datasets; UChile and TDSD datasets to measure classifier performance. The performance of skin classifier was measured based on true positive (TF) and false positive (FP) indicator. This newly proposed model was compared with Kovac, Saleh and Swift models. The experimental results showed that the RGB ratio model outperformed all the other models in term of detection rate. The RGB ratio model is able to reduce FP detection that caused by reddish objects colour as well as be able to detect darkened skin and skin covered by shadow. △ Less

Submitted 11 December, 2012; originally announced December 2012.

Comments: 14 pages; International Journal on Soft Computing (IJSC) Vol.3, No.4, November 2012

MSC Class: 68T10

arXiv:1006.4568 [pdf]

Approaches, Challenges and Future Direction of Image Retrieval

Authors: Hui Hui Wang, Dzulkifli Mohamad, N. A. Ismail

Abstract: This paper attempts to discuss the evolution of the retrieval approaches focusing on development, challenges and future direction of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level featur… ▽ More This paper attempts to discuss the evolution of the retrieval approaches focusing on development, challenges and future direction of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users' mind. Hence, introducing an interpretation inconsistency between image descriptors and high level semantics that known as the semantic gap. This paper also discusses the semantic gap issues, user query mechanisms as well as common ways used to bridge the gap in image retrieval. △ Less

Submitted 23 June, 2010; originally announced June 2010.

Comments: IEEE Publication Format, https://sites.google.com/site/journalofcomputing/

Journal ref: Journal of Computing, Vol. 2, No. 6, June 2010, NY, USA, ISSN 2151-9617

arXiv:1006.4539 [pdf]

A Study of User's Performance and Satisfaction on the Web Based Photo Annotation with Speech Interaction

Authors: Siti Azura Ramlan, Nor Azman Ismail

Abstract: This paper reports on empirical evaluation study of users' performance and satisfaction with prototype of Web Based speech photo annotation with speech interaction. Participants involved consist of Johor Bahru citizens from various background. They have completed two parts of annotation task; part A involving PhotoASys; photo annotation system with proposed speech interaction and part B involving… ▽ More This paper reports on empirical evaluation study of users' performance and satisfaction with prototype of Web Based speech photo annotation with speech interaction. Participants involved consist of Johor Bahru citizens from various background. They have completed two parts of annotation task; part A involving PhotoASys; photo annotation system with proposed speech interaction and part B involving Microsoft Microsoft Vista Speech Interaction style. They have completed eight tasks for each part including system login and selection of album and photos. Users' performance was recorded using computer screen recording software. Data were captured on the task completion time and subjective satisfaction. Participants need to complete a questionnaire on the subjective satisfaction when the task was completed. The performance data show the comparison between proposed speech interaction and Microsoft Vista Speech interaction applied in photo annotation system, PhotoASys. On average, the reduction in annotation performance time due to using proposed speech interaction style was 64.72% rather than using speech interaction Microsoft Vista style. Data analysis were showed in different statistical significant in annotation performance and subjective satisfaction for both styles of interaction. These results could be used for the next design in related software which involves personal belonging management. △ Less

Submitted 23 June, 2010; originally announced June 2010.

Comments: IEEE Publication Format, https://sites.google.com/site/journalofcomputing/

Journal ref: Journal of Computing, Vol. 2, No. 6, June 2010, NY, USA, ISSN 2151-9617

arXiv:1005.4014 [pdf]

A Study on Potential of Integrating Multimodal Interaction into Musical Conducting Education

Authors: Gilbert Phuah Leong Siang, Nor Azman Ismail, Pang Yee Yong

Abstract: With the rapid development of computer technology, computer music has begun to appear in the laboratory. Many potential utility of computer music is gradually increasing. The purpose of this paper is attempted to analyze the possibility of integrating multimodal interaction such as vision-based hand gesture and speech interaction into musical conducting education. To achieve this purpose, this pap… ▽ More With the rapid development of computer technology, computer music has begun to appear in the laboratory. Many potential utility of computer music is gradually increasing. The purpose of this paper is attempted to analyze the possibility of integrating multimodal interaction such as vision-based hand gesture and speech interaction into musical conducting education. To achieve this purpose, this paper is focus on discuss some related research and the traditional musical conducting education. To do so, six musical conductors had been interviewed to share their musical conducting learning/ teaching experience. These interviews had been analyzed in this paper to show the syllabus and the focus of musical conducting education for beginners. △ Less

Submitted 21 May, 2010; originally announced May 2010.

Comments: http://www.journalofcomputing.org

Journal ref: Journal of Computing, Volume 2, Issue 5, May 2010

arXiv:0906.0845 [pdf]

Analyzing of MOS and Codec Selection for Voice over IP Technology

Authors: Mohd Nazri Ismail

Abstract: In this research, we propose an architectural solution to implement the voice over IP (VoIP) service in campus environment network. Voice over IP (VoIP) technology has become a discussion issue for this time being. Today, the deployment of this technology on an organization truly can give a great financial benefit over traditional telephony. Therefore, this study is to analyze the VoIP Codec sel… ▽ More In this research, we propose an architectural solution to implement the voice over IP (VoIP) service in campus environment network. Voice over IP (VoIP) technology has become a discussion issue for this time being. Today, the deployment of this technology on an organization truly can give a great financial benefit over traditional telephony. Therefore, this study is to analyze the VoIP Codec selection and investigate the Mean Opinion Score (MOS) performance areas evolved with the quality of service delivered by soft phone and IP phone. This study focuses on quality of voice prediction such as i) accuracy of MOS between automated system and human perception and ii) different types of codec performance measurement via human perception using MOS technique. In this study, network management system (NMS) is used to monitor and capture the performance of VoIP in campus environment. In addition, the most apparent of implementing soft phone and IP phone in campus environment is to define the best codec selection that can be used in operational environment. Based on the finding result, the MOS measurement through automated and manual system is able to predict and evaluate VoIP performance. In addition, based on manual MOS measurement, VoIP conversations over LAN contribute more reliability and availability performance compare to WAN. △ Less

Submitted 4 June, 2009; originally announced June 2009.

Comments: 14 pages, exposed on 5th International Conference "Actualities and Perspectives on Hardware and Software" - APHS2009, Timisoara, Romania

Journal ref: Ann. Univ. Tibiscus Comp. Sci. Series VII(2009),263-276

Showing 1–13 of 13 results for author: Ismail, N