Skip to main content

Showing 1–12 of 12 results for author: Boughorbel, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.13944  [pdf, other

    cs.CL cs.AI

    Fanar: An Arabic-Centric Multimodal Generative AI Platform

    Authors: Fanar Team, Ummar Abbas, Mohammad Shahmeer Ahmad, Firoj Alam, Enes Altinisik, Ehsannedin Asgari, Yazan Boshmaf, Sabri Boughorbel, Sanjay Chawla, Shammur Chowdhury, Fahim Dalvi, Kareem Darwish, Nadir Durrani, Mohamed Elfeky, Ahmed Elmagarmid, Mohamed Eltabakh, Masoomali Fatehkia, Anastasios Fragkopoulos, Maram Hasanain, Majd Hawasly, Mus'ab Husaini, Soon-Gyo Jung, Ji Kim Lucas, Walid Magdy, Safa Messaoud , et al. (17 additional authors not shown)

    Abstract: We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that supports language, speech and image generation tasks. At the heart of Fanar are Fanar Star and Fanar Prime, two highly capable Arabic Large Language Models (LLMs) that are best in the class on well established benchmarks for similar sized models. Fanar Star is a 7B (billion) parameter model that was trained from… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

    ACM Class: I.2.0; D.2.0

  2. arXiv:2405.14277  [pdf, other

    cs.CL

    Improving Language Models Trained on Translated Data with Continual Pre-Training and Dictionary Learning Analysis

    Authors: Sabri Boughorbel, MD Rizwan Parvez, Majd Hawasly

    Abstract: Training LLMs for low-resource languages usually utilizes data augmentation from English using machine translation (MT). This, however, brings a number of challenges to LLM training: there are large costs attached to translating and curating huge amounts of content with high-end machine translation solutions; the translated content carries over cultural biases; and if the translation is not faithf… ▽ More

    Submitted 7 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 16 pages

  3. arXiv:2405.01114  [pdf, ps, other

    cs.LG cs.RO

    Continual Learning from Simulated Interactions via Multitask Prospective Rehearsal for Bionic Limb Behavior Modeling

    Authors: Sharmita Dey, Benjamin Paassen, Sarath Ravindran Nair, Sabri Boughorbel, Arndt F. Schilling

    Abstract: Lower limb amputations and neuromuscular impairments severely restrict mobility, necessitating advancements beyond conventional prosthetics. While motorized bionic limbs show promise, their effectiveness depends on replicating the dynamic coordination of human movement across diverse environments. In this paper, we introduce a model for human behavior in the context of bionic prosthesis control. O… ▽ More

    Submitted 5 June, 2025; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted at Transactions on Machine Learning Research (TMLR) 2025

  4. arXiv:2403.17068  [pdf, other

    cs.CR

    Semantic Ranking for Automated Adversarial Technique Annotation in Security Text

    Authors: Udesh Kumarasinghe, Ahmed Lekssays, Husrev Taha Sencar, Sabri Boughorbel, Charitha Elvitigala, Preslav Nakov

    Abstract: We introduce a new method for extracting structured threat behaviors from threat intelligence text. Our method is based on a multi-stage ranking architecture that allows jointly optimizing for efficiency and effectiveness. Therefore, we believe this problem formulation better aligns with the real-world nature of the task considering the large number of adversary techniques and the extensive body o… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  5. arXiv:2310.14819  [pdf, other

    cs.CL

    Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic

    Authors: Sabri Boughorbel, Majd Hawasly

    Abstract: While significant progress has been made in benchmarking Large Language Models (LLMs) across various tasks, there is a lack of comprehensive evaluation of their abilities in responding to multi-turn instructions in less-commonly tested languages like Arabic. Our paper offers a detailed examination of the proficiency of open LLMs in such scenarios in Arabic. Utilizing a customized Arabic translatio… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted at SIGARAB ArabicNLP 2023

  6. arXiv:2308.04945  [pdf, other

    cs.CL cs.AI

    LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

    Authors: Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali, Majd Hawasly, Nadir Durrani, Firoj Alam

    Abstract: The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench framework, whi… ▽ More

    Submitted 26 February, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted as a demo paper at EACL 2024

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  7. arXiv:2305.14982  [pdf, other

    cs.CL cs.AI

    LAraBench: Benchmarking Arabic AI with Large Language Models

    Authors: Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Yousseif Elshahawy, Ahmed Ali, Nadir Durrani, Natasa Milic-Frayling, Firoj Alam

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks, including sequence tag… ▽ More

    Submitted 5 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Foundation Models, Large Language Models, Arabic NLP, Arabic Speech, Arabic AI, GPT3.5 Evaluation, USM Evaluation, Whisper Evaluation, GPT-4, BLOOMZ, Jais13b

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  8. arXiv:2304.01233  [pdf, other

    cs.CL cs.LG

    Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department

    Authors: Sabri Boughorbel, Fethi Jarray, Abdulaziz Al Homaid, Rashid Niaz, Khalid Alyafei

    Abstract: Language modeling have shown impressive progress in generating compelling text with good accuracy and high semantic coherence. An interesting research direction is to augment these powerful models for specific applications using contextual information. In this work, we explore multi-modal language modeling for healthcare applications. We are interested in outcome prediction and patient triage in h… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  9. arXiv:2111.07419  [pdf, other

    cs.RO cs.LG stat.AP

    Learning a Shared Model for Motorized Prosthetic Joints to Predict Ankle-Joint Motion

    Authors: Sharmita Dey, Sabri Boughorbel, Arndt F. Schilling

    Abstract: Control strategies for active prostheses or orthoses use sensor inputs to recognize the user's locomotive intention and generate corresponding control commands for producing the desired locomotion. In this paper, we propose a learning-based shared model for predicting ankle-joint motion for different locomotion modes like level-ground walking, stair ascent, stair descent, slope ascent, and slope d… ▽ More

    Submitted 14 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021 Workshop Spotlight presentation, Machine Learning for Health (ML4H) 2021 - Extended Abstract

  10. arXiv:2103.04048  [pdf, other

    cs.LG

    Fairness in TabNet Model by Disentangled Representation for the Prediction of Hospital No-Show

    Authors: Sabri Boughorbel, Fethi Jarray, Abdou Kadri

    Abstract: Patient no-shows is a major burden for health centers leading to loss of revenue, increased waiting time and deteriorated health outcome. Developing machine learning (ML) models for the prediction of no -shows could help addressing this important issue. It is crucial to consider fair ML models for no-show prediction in order to ensure equality of opportunity in accessing healthcare services. In th… ▽ More

    Submitted 6 March, 2021; originally announced March 2021.

  11. arXiv:1910.12191  [pdf, other

    cs.LG cs.AI stat.ML

    Federated Uncertainty-Aware Learning for Distributed Hospital EHR Data

    Authors: Sabri Boughorbel, Fethi Jarray, Neethu Venugopal, Shabir Moosa, Haithum Elhadi, Michel Makhlouf

    Abstract: Recent works have shown that applying Machine Learning to Electronic Health Records (EHR) can strongly accelerate precision medicine. This requires developing models based on diverse EHR sources. Federated Learning (FL) has enabled predictive modeling using distributed training which lifted the need of sharing data and compromising privacy. Since models are distributed in FL, it is attractive to d… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended Abstract

  12. arXiv:1811.09782  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Alternating Loss Correction for Preterm-Birth Prediction from EHR Data with Noisy Labels

    Authors: Sabri Boughorbel, Fethi Jarray, Neethu Venugopal, Haithum Elhadi

    Abstract: In this paper we are interested in the prediction of preterm birth based on diagnosis codes from longitudinal EHR. We formulate the prediction problem as a supervised classification with noisy labels. Our base classifier is a Recurrent Neural Network with an attention mechanism. We assume the availability of a data subset with both noisy and clean labels. For the cohort definition, most of the dia… ▽ More

    Submitted 24 November, 2018; originally announced November 2018.

    Comments: Submission Id: 79, Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/79