Skip to main content

Showing 1–10 of 10 results for author: Ghaleb, M

.
  1. arXiv:2501.11167  [pdf, other

    cs.LG cs.IT

    Federated Testing (FedTest): A New Scheme to Enhance Convergence and Mitigate Adversarial Attacks in Federating Learning

    Authors: Mustafa Ghaleb, Mohanad Obeed, Muhamad Felemban, Anas Chaaban, Halim Yanikomeroglu

    Abstract: Federated Learning (FL) has emerged as a significant paradigm for training machine learning models. This is due to its data-privacy-preserving property and its efficient exploitation of distributed computational resources. This is achieved by conducting the training process in parallel at distributed users. However, traditional FL strategies grapple with difficulties in evaluating the quality of r… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

  2. arXiv:2407.19051  [pdf, other

    cs.NI cs.AI

    Towards a Transformer-Based Pre-trained Model for IoT Traffic Classification

    Authors: Bruna Bazaluk, Mosab Hamdan, Mustafa Ghaleb, Mohammed S. M. Gismalla, Flavio S. Correa da Silva, Daniel Macêdo Batista

    Abstract: The classification of IoT traffic is important to improve the efficiency and security of IoT-based networks. As the state-of-the-art classification methods are based on Deep Learning, most of the current results require a large amount of data to be trained. Thereby, in real-life situations, where there is a scarce amount of IoT traffic data, the models would not perform so well. Consequently, thes… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Updated version of: B. Bazaluk, M. Hamdan, M. Ghaleb, M. S. M. Gismalla, F. S. Correa da Silva and D. M. Batista, "Towards a Transformer-Based Pre-trained Model for IoT Traffic Classification," NOMS 2024-2024 IEEE Network Operations and Management Symposium, Seoul, Korea, Republic of, 2024, pp. 1-7, doi: 10.1109/NOMS59830.2024.10575448

  3. arXiv:2402.03177  [pdf, other

    cs.CL cs.LG

    CIDAR: Culturally Relevant Instruction Dataset For Arabic

    Authors: Zaid Alyafeai, Khalid Almubarak, Ahmed Ashraf, Deema Alnuhait, Saied Alshahrani, Gubran A. Q. Abdulrahman, Gamil Ahmed, Qais Gawah, Zead Saleh, Mustafa Ghaleb, Yousef Ali, Maged S. Al-Shaibani

    Abstract: Instruction tuning has emerged as a prominent methodology for teaching Large Language Models (LLMs) to follow instructions. However, current instruction datasets predominantly cater to English or are derived from English-dominated LLMs, resulting in inherent biases toward Western culture. This bias significantly impacts the linguistic structures of non-English languages such as Arabic, which has a… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  4. arXiv:2308.03771  [pdf

    math.OC cs.DM cs.LO

    Reliability Analysis of a Multi-State Truly-Threshold System Using a Multi-Valued Karnaugh Map

    Authors: Ali Muhammad Ali Rushdi, Fares Ahmad Muhammad Ghaleb

    Abstract: This paper deals with the Boolean-based analysis of a prominent class of non-repairable coherent multistate systems with independent nonidentical multistate components. This class of systems is represented by a multistate coherent truly threshold system of several states, which is not necessarily binary-imaged. The paper represents such a system via Boolean expressions of system success or system… ▽ More

    Submitted 26 July, 2023; originally announced August 2023.

    Comments: 42 pages, 11 figures, 5 tables

  5. File Fragment Classification using Light-Weight Convolutional Neural Networks

    Authors: Mustafa Ghaleb, Kunwar Saaim, Muhamad Felemban, Saleh Al-Saleh, Ahmad Al-Mulhem

    Abstract: In digital forensics, file fragment classification is an important step toward completing file carving process. There exist several techniques to identify the type of file fragments without relying on meta-data, such as using features like header/footer and N-gram to identify the fragment type. Recently, convolutional neural network (CNN) models have been used to build classification models to ach… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 11 pages, 17 figures

    Journal ref: IEEE Access.12. (2024) 157179-157191

  6. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  7. arXiv:2208.00932  [pdf, other

    cs.CL

    Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets

    Authors: Yousef Altaher, Ali Fadel, Mazen Alotaibi, Mazen Alyazidi, Mishari Al-Mutairi, Mutlaq Aldhbuiub, Abdulrahman Mosaibah, Abdelrahman Rezk, Abdulrazzaq Alhendi, Mazen Abo Shal, Emad A. Alghamdi, Maged S. Alshaibani, Jezia Zakraoui, Wafaa Mohammed, Kamel Gaanoun, Khalid N. Elmadani, Mustafa Ghaleb, Nouamane Tazi, Raed Alharbi, Maraim Masoud, Zaid Alyafeai

    Abstract: Masader (Alyafeai et al., 2021) created a metadata structure to be used for cataloguing Arabic NLP datasets. However, developing an easy way to explore such a catalogue is a challenging task. In order to give the optimal experience for users and researchers exploring the catalogue, several design and user experience challenges must be resolved. Furthermore, user interactions with the website may p… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

  8. arXiv:2110.06744  [pdf, other

    cs.CL

    Masader: Metadata Sourcing for Arabic Text and Speech Data Resources

    Authors: Zaid Alyafeai, Maraim Masoud, Mustafa Ghaleb, Maged S. Al-shaibani

    Abstract: The NLP pipeline has evolved dramatically in the last few years. The first step in the pipeline is to find suitable annotated datasets to evaluate the tasks we are trying to solve. Unfortunately, most of the published datasets lack metadata annotations that describe their attributes. Not to mention, the absence of a public catalogue that indexes all the publicly available datasets related to speci… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  9. arXiv:2106.10745  [pdf, other

    cs.CL

    Calliar: An Online Handwritten Dataset for Arabic Calligraphy

    Authors: Zaid Alyafeai, Maged S. Al-shaibani, Mustafa Ghaleb, Yousif Ahmed Al-Wajih

    Abstract: Calligraphy is an essential part of the Arabic heritage and culture. It has been used in the past for the decoration of houses and mosques. Usually, such calligraphy is designed manually by experts with aesthetic insights. In the past few years, there has been a considerable effort to digitize such type of art by either taking a photo of decorated buildings or drawing them using digital devices. T… ▽ More

    Submitted 25 June, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

  10. arXiv:2106.07540  [pdf, other

    cs.CL cs.LG

    Evaluating Various Tokenizers for Arabic Text Classification

    Authors: Zaid Alyafeai, Maged S. Al-shaibani, Mustafa Ghaleb, Irfan Ahmad

    Abstract: The first step in any NLP pipeline is to split the text into individual tokens. The most obvious and straightforward approach is to use words as tokens. However, given a large text corpus, representing all the words is not efficient in terms of vocabulary size. In the literature, many tokenization algorithms have emerged to tackle this problem by creating subwords which in turn limits the vocabula… ▽ More

    Submitted 28 September, 2021; v1 submitted 14 June, 2021; originally announced June 2021.