Search | arXiv e-print repository

Fanar: An Arabic-Centric Multimodal Generative AI Platform

Authors: Fanar Team, Ummar Abbas, Mohammad Shahmeer Ahmad, Firoj Alam, Enes Altinisik, Ehsannedin Asgari, Yazan Boshmaf, Sabri Boughorbel, Sanjay Chawla, Shammur Chowdhury, Fahim Dalvi, Kareem Darwish, Nadir Durrani, Mohamed Elfeky, Ahmed Elmagarmid, Mohamed Eltabakh, Masoomali Fatehkia, Anastasios Fragkopoulos, Maram Hasanain, Majd Hawasly, Mus'ab Husaini, Soon-Gyo Jung, Ji Kim Lucas, Walid Magdy, Safa Messaoud , et al. (17 additional authors not shown)

Abstract: We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that supports language, speech and image generation tasks. At the heart of Fanar are Fanar Star and Fanar Prime, two highly capable Arabic Large Language Models (LLMs) that are best in the class on well established benchmarks for similar sized models. Fanar Star is a 7B (billion) parameter model that was trained from… ▽ More We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that supports language, speech and image generation tasks. At the heart of Fanar are Fanar Star and Fanar Prime, two highly capable Arabic Large Language Models (LLMs) that are best in the class on well established benchmarks for similar sized models. Fanar Star is a 7B (billion) parameter model that was trained from scratch on nearly 1 trillion clean and deduplicated Arabic, English and Code tokens. Fanar Prime is a 9B parameter model continually trained on the Gemma-2 9B base model on the same 1 trillion token set. Both models are concurrently deployed and designed to address different types of prompts transparently routed through a custom-built orchestrator. The Fanar platform provides many other capabilities including a customized Islamic Retrieval Augmented Generation (RAG) system for handling religious prompts, a Recency RAG for summarizing information about current or recent events that have occurred after the pre-training data cut-off date. The platform provides additional cognitive capabilities including in-house bilingual speech recognition that supports multiple Arabic dialects, voice and image generation that is fine-tuned to better reflect regional characteristics. Finally, Fanar provides an attribution service that can be used to verify the authenticity of fact based generated content. The design, development, and implementation of Fanar was entirely undertaken at Hamad Bin Khalifa University's Qatar Computing Research Institute (QCRI) and was sponsored by Qatar's Ministry of Communications and Information Technology to enable sovereign AI technology development. △ Less

Submitted 18 January, 2025; originally announced January 2025.

ACM Class: I.2.0; D.2.0

arXiv:2405.14277 [pdf, other]

Improving Language Models Trained on Translated Data with Continual Pre-Training and Dictionary Learning Analysis

Authors: Sabri Boughorbel, MD Rizwan Parvez, Majd Hawasly

Abstract: Training LLMs for low-resource languages usually utilizes data augmentation from English using machine translation (MT). This, however, brings a number of challenges to LLM training: there are large costs attached to translating and curating huge amounts of content with high-end machine translation solutions; the translated content carries over cultural biases; and if the translation is not faithf… ▽ More Training LLMs for low-resource languages usually utilizes data augmentation from English using machine translation (MT). This, however, brings a number of challenges to LLM training: there are large costs attached to translating and curating huge amounts of content with high-end machine translation solutions; the translated content carries over cultural biases; and if the translation is not faithful and accurate, data quality degrades causing issues in the trained model. In this work, we investigate the role of translation and synthetic data in training language models. We translate TinyStories, a dataset of 2.2M short stories for 3-4 year old children, from English to Arabic using the open NLLB-3B MT model. We train a number of story generation models of size 1M-33M parameters using this data. We identify a number of quality and task-specific issues in the resulting models. To rectify these issues, we further pre-train the models with a small dataset of synthesized high-quality Arabic stories generated by a capable LLM, representing 1% of the original training data. We show, using GPT-4 as a judge and Dictionary Learning Analysis from mechanistic interpretability, that the suggested approach is a practical means to resolve some of the machine translation pitfalls. We illustrate the improvements through case studies of linguistic and cultural bias issues. △ Less

Submitted 7 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 16 pages

arXiv:2405.01114 [pdf, ps, other]

Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling

Authors: Sharmita Dey, Benjamin Paassen, Sarath Ravindran Nair, Sabri Boughorbel, Arndt F. Schilling

Abstract: Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. While motorized bionic limbs show promise, their effectiveness depends on replicating the dynamic coordination of human movement across diverse environments. In this paper, we introduce a model for human behavior in the context of bionic prosthesis control. O… ▽ More Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. While motorized bionic limbs show promise, their effectiveness depends on replicating the dynamic coordination of human movement across diverse environments. In this paper, we introduce a model for human behavior in the context of bionic prosthesis control. Our approach leverages human locomotion demonstrations to learn the synergistic coupling of the lower limbs, enabling the prediction of the kinematic behavior of a missing limb during tasks such as walking, climbing inclines, and stairs. We propose a multitasking, continually adaptive model that anticipates and refines movements over time. At the core of our method is a technique called multitask prospective rehearsal, that anticipates and synthesizes future movements based on the previous prediction and employs a corrective mechanism for subsequent predictions. Our evolving architecture merges lightweight, task-specific modules on a shared backbone, ensuring both specificity and scalability. We validate our model through experiments on real-world human gait datasets, including transtibial amputees, across a wide range of locomotion tasks. Results demonstrate that our approach consistently outperforms baseline models, particularly in scenarios with distributional shifts, adversarial perturbations, and noise. △ Less

Submitted 5 June, 2025; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted at Transactions on Machine Learning Research (TMLR) 2025

arXiv:2403.17068 [pdf, other]

Semantic Ranking for Automated Adversarial Technique Annotation in Security Text

Authors: Udesh Kumarasinghe, Ahmed Lekssays, Husrev Taha Sencar, Sabri Boughorbel, Charitha Elvitigala, Preslav Nakov

Abstract: We introduce a new method for extracting structured threat behaviors from threat intelligence text. Our method is based on a multi-stage ranking architecture that allows jointly optimizing for efficiency and effectiveness. Therefore, we believe this problem formulation better aligns with the real-world nature of the task considering the large number of adversary techniques and the extensive body o… ▽ More We introduce a new method for extracting structured threat behaviors from threat intelligence text. Our method is based on a multi-stage ranking architecture that allows jointly optimizing for efficiency and effectiveness. Therefore, we believe this problem formulation better aligns with the real-world nature of the task considering the large number of adversary techniques and the extensive body of threat intelligence created by security analysts. Our findings show that the proposed system yields state-of-the-art performance results for this task. Results show that our method has a top-3 recall performance of 81\% in identifying the relevant technique among 193 top-level techniques. Our tests also demonstrate that our system performs significantly better (+40\%) than the widely used large language models when tested under a zero-shot setting. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2310.14819 [pdf, other]

Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic

Authors: Sabri Boughorbel, Majd Hawasly

Abstract: While significant progress has been made in benchmarking Large Language Models (LLMs) across various tasks, there is a lack of comprehensive evaluation of their abilities in responding to multi-turn instructions in less-commonly tested languages like Arabic. Our paper offers a detailed examination of the proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized Arabic translatio… ▽ More While significant progress has been made in benchmarking Large Language Models (LLMs) across various tasks, there is a lack of comprehensive evaluation of their abilities in responding to multi-turn instructions in less-commonly tested languages like Arabic. Our paper offers a detailed examination of the proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized Arabic translation of the MT-Bench benchmark suite, we employ GPT-4 as a uniform evaluator for both English and Arabic queries to assess and compare the performance of the LLMs on various open-ended tasks. Our findings reveal variations in model responses on different task categories, e.g., logic vs. literacy, when instructed in English or Arabic. We find that fine-tuned base models using multilingual and multi-turn datasets could be competitive to models trained from scratch on multilingual data. Finally, we hypothesize that an ensemble of small, open LLMs could perform competitively to proprietary LLMs on the benchmark. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted at SIGARAB ArabicNLP 2023

arXiv:2308.04945 [pdf, other]

LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Authors: Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali, Majd Hawasly, Nadir Durrani, Firoj Alam

Abstract: The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench framework, whi… ▽ More The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench framework, which can be seamlessly customized to evaluate LLMs for any NLP task, regardless of language. The framework features generic dataset loaders, several model providers, and pre-implements most standard evaluation metrics. It supports in-context learning with zero- and few-shot settings. A specific dataset and task can be evaluated for a given LLM in less than 20 lines of code while allowing full flexibility to extend the framework for custom datasets, models, or tasks. The framework has been tested on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. We open-sourced LLMeBench for the community (https://github.com/qcri/LLMeBench/) and a video demonstrating the framework is available online. (https://youtu.be/9cC2m_abk3A) △ Less

Submitted 26 February, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted as a demo paper at EACL 2024

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

arXiv:2305.14982 [pdf, other]

LAraBench: Benchmarking Arabic AI with Large Language Models

Authors: Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Yousseif Elshahawy, Ahmed Ali, Nadir Durrani, Natasa Milic-Frayling, Firoj Alam

Abstract: Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks, including sequence tag… ▽ More Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks, including sequence tagging and content classification across different domains. We utilized models such as GPT-3.5-turbo, GPT-4, BLOOMZ, Jais-13b-chat, Whisper, and USM, employing zero and few-shot learning techniques to tackle 33 distinct tasks across 61 publicly available datasets. This involved 98 experimental setups, encompassing ~296K data points, ~46 hours of speech, and 30 sentences for Text-to-Speech (TTS). This effort resulted in 330+ sets of experiments. Our analysis focused on measuring the performance gap between SOTA models and LLMs. The overarching trend observed was that SOTA models generally outperformed LLMs in zero-shot learning, with a few exceptions. Notably, larger computational models with few-shot learning techniques managed to reduce these performance gaps. Our findings provide valuable insights into the applicability of LLMs for Arabic NLP and speech processing tasks. △ Less

Submitted 5 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Foundation Models, Large Language Models, Arabic NLP, Arabic Speech, Arabic AI, GPT3.5 Evaluation, USM Evaluation, Whisper Evaluation, GPT-4, BLOOMZ, Jais13b

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

arXiv:2304.01233 [pdf, other]

Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department

Authors: Sabri Boughorbel, Fethi Jarray, Abdulaziz Al Homaid, Rashid Niaz, Khalid Alyafei

Abstract: Language modeling have shown impressive progress in generating compelling text with good accuracy and high semantic coherence. An interesting research direction is to augment these powerful models for specific applications using contextual information. In this work, we explore multi-modal language modeling for healthcare applications. We are interested in outcome prediction and patient triage in h… ▽ More Language modeling have shown impressive progress in generating compelling text with good accuracy and high semantic coherence. An interesting research direction is to augment these powerful models for specific applications using contextual information. In this work, we explore multi-modal language modeling for healthcare applications. We are interested in outcome prediction and patient triage in hospital emergency department based on text information in chief complaints and vital signs recorded at triage. We adapt Perceiver - a modality-agnostic transformer-based model that has shown promising results in several applications. Since vital-sign modality is represented in tabular format, we modified Perceiver position encoding to ensure permutation invariance. We evaluated the multi-modal language model for the task of diagnosis code prediction using MIMIC-IV ED dataset on 120K visits. In the experimental analysis, we show that mutli-modality improves the prediction performance compared with models trained solely on text or vital signs. We identified disease categories for which multi-modality leads to performance improvement and show that for these categories, vital signs have added predictive power. By analyzing the cross-attention layer, we show how multi-modality contributes to model predictions. This work gives interesting insights on the development of multi-modal language models for healthcare applications. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2111.07419 [pdf, other]

Learning a Shared Model for Motorized Prosthetic Joints to Predict Ankle-Joint Motion

Authors: Sharmita Dey, Sabri Boughorbel, Arndt F. Schilling

Abstract: Control strategies for active prostheses or orthoses use sensor inputs to recognize the user's locomotive intention and generate corresponding control commands for producing the desired locomotion. In this paper, we propose a learning-based shared model for predicting ankle-joint motion for different locomotion modes like level-ground walking, stair ascent, stair descent, slope ascent, and slope d… ▽ More Control strategies for active prostheses or orthoses use sensor inputs to recognize the user's locomotive intention and generate corresponding control commands for producing the desired locomotion. In this paper, we propose a learning-based shared model for predicting ankle-joint motion for different locomotion modes like level-ground walking, stair ascent, stair descent, slope ascent, and slope descent without the need to classify between them. Features extracted from hip and knee joint angular motion are used to continuously predict the ankle angles and moments using a Feed-Forward Neural Network-based shared model. We show that the shared model is adequate for predicting the ankle angles and moments for different locomotion modes without explicitly classifying between the modes. The proposed strategy shows the potential for devising a high-level controller for an intelligent prosthetic ankle that can adapt to different locomotion modes. △ Less

Submitted 14 November, 2021; originally announced November 2021.

Comments: NeurIPS 2021 Workshop Spotlight presentation, Machine Learning for Health (ML4H) 2021 - Extended Abstract

arXiv:2103.04048 [pdf, other]

Fairness in TabNet Model by Disentangled Representation for the Prediction of Hospital No-Show

Authors: Sabri Boughorbel, Fethi Jarray, Abdou Kadri

Abstract: Patient no-shows is a major burden for health centers leading to loss of revenue, increased waiting time and deteriorated health outcome. Developing machine learning (ML) models for the prediction of no -shows could help addressing this important issue. It is crucial to consider fair ML models for no-show prediction in order to ensure equality of opportunity in accessing healthcare services. In th… ▽ More Patient no-shows is a major burden for health centers leading to loss of revenue, increased waiting time and deteriorated health outcome. Developing machine learning (ML) models for the prediction of no -shows could help addressing this important issue. It is crucial to consider fair ML models for no-show prediction in order to ensure equality of opportunity in accessing healthcare services. In this wo rk, we are interested in developing deep learning models for no-show prediction based on tabular data while ensuring fairness properties. Our baseline model, TabNet, uses on attentive feature transforme rs and has shown promising results for tabular data. We propose Fair-TabNet based on representation learning that disentangles predictive from sensitive components. The model is trained to jointly min imize loss functions on no-shows and sensitive variables while ensuring that the sensitive and prediction representations are orthogonal. In the experimental analysis, we used a hospital dataset of 210, 000 appointments collected in 2019. Our preliminary results show that the proposed Fair-TabNet improves the predictive, fairness performance and convergence speed over TabNet for the task of appointment no-show prediction. The comparison with the state-of-the art models for tabular data shows promising results and could be further improved by a better tuning of hyper-parameters. △ Less

Submitted 6 March, 2021; originally announced March 2021.

arXiv:1910.12191 [pdf, other]

Federated Uncertainty-Aware Learning for Distributed Hospital EHR Data

Authors: Sabri Boughorbel, Fethi Jarray, Neethu Venugopal, Shabir Moosa, Haithum Elhadi, Michel Makhlouf

Abstract: Recent works have shown that applying Machine Learning to Electronic Health Records (EHR) can strongly accelerate precision medicine. This requires developing models based on diverse EHR sources. Federated Learning (FL) has enabled predictive modeling using distributed training which lifted the need of sharing data and compromising privacy. Since models are distributed in FL, it is attractive to d… ▽ More Recent works have shown that applying Machine Learning to Electronic Health Records (EHR) can strongly accelerate precision medicine. This requires developing models based on diverse EHR sources. Federated Learning (FL) has enabled predictive modeling using distributed training which lifted the need of sharing data and compromising privacy. Since models are distributed in FL, it is attractive to devise ensembles of Deep Neural Networks that also assess model uncertainty. We propose a new FL model called Federated Uncertainty-Aware Learning Algorithm (FUALA) that improves on Federated Averaging (FedAvg) in the context of EHR. FUALA embeds uncertainty information in two ways: It reduces the contribution of models with high uncertainty in the aggregated model. It also introduces model ensembling at prediction time by keeping the last layers of each hospital from the final round. In FUALA, the Federator (central node) sends at each round the average model to all hospitals as well as a randomly assigned hospital model update to estimate its generalization on that hospital own data. Each hospital sends back its model update as well a generalization estimation of the assigned model. At prediction time, the model outputs C predictions for each sample where C is the number of hospital models. The experimental analysis conducted on a cohort of 87K deliveries for the task of preterm-birth prediction showed that the proposed approach outperforms FedAvg when evaluated on out-of-distribution data. We illustrated how uncertainty could be measured using the proposed approach. △ Less

Submitted 27 October, 2019; originally announced October 2019.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

arXiv:1811.09782 [pdf, ps, other]

Alternating Loss Correction for Preterm-Birth Prediction from EHR Data with Noisy Labels

Authors: Sabri Boughorbel, Fethi Jarray, Neethu Venugopal, Haithum Elhadi

Abstract: In this paper we are interested in the prediction of preterm birth based on diagnosis codes from longitudinal EHR. We formulate the prediction problem as a supervised classification with noisy labels. Our base classifier is a Recurrent Neural Network with an attention mechanism. We assume the availability of a data subset with both noisy and clean labels. For the cohort definition, most of the dia… ▽ More In this paper we are interested in the prediction of preterm birth based on diagnosis codes from longitudinal EHR. We formulate the prediction problem as a supervised classification with noisy labels. Our base classifier is a Recurrent Neural Network with an attention mechanism. We assume the availability of a data subset with both noisy and clean labels. For the cohort definition, most of the diagnosis codes on mothers' records related to pregnancy are ambiguous for the definition of full-term and preterm classes. On the other hand, diagnosis codes on babies' records provide fine-grained information on prematurity. Due to data de-identification, the links between mothers and babies are not available. We developed a heuristic based on admission and discharge times to match babies to their mothers and hence enrich mothers' records with additional information on delivery status. The obtained additional dataset from the matching heuristic has noisy labels and was used to leverage the training of the deep learning model. We propose an Alternating Loss Correction (ALC) method to train deep models with both clean and noisy labels. First, the label corruption matrix is estimated using the data subset with both noisy and clean labels. Then it is used in the model as a dense output layer to correct for the label noise. The network is alternately trained on epochs with the clean dataset with a simple cross-entropy loss and on next epoch with the noisy dataset and a loss corrected with the estimated corruption matrix. The experiments for the prediction of preterm birth at 90 days before delivery showed an improvement in performance compared with baseline and state of-the-art methods. △ Less

Submitted 24 November, 2018; originally announced November 2018.

Comments: Submission Id: 79, Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/79

Showing 1–12 of 12 results for author: Boughorbel, S