-
Palm: A Culturally Inclusive and Linguistically Diverse Dataset for Arabic LLMs
Authors:
Fakhraddin Alwajih,
Abdellah El Mekki,
Samar Mohamed Magdy,
Abdelrahim A. Elmadany,
Omer Nacar,
El Moatez Billah Nagoudi,
Reem Abdel-Salam,
Hanin Atwany,
Youssef Nafea,
Abdulfattah Mohammed Yahya,
Rahaf Alhamouri,
Hamzah A. Alsayadi,
Hiba Zayed,
Sara Shatnawi,
Serry Sibaee,
Yasir Ech-Chammakhy,
Walid Al-Dhabyani,
Marwa Mohamed Ali,
Imen Jarraya,
Ahmed Oumar El-Shangiti,
Aisha Alraeesi,
Mohammed Anwar Al-Ghrawi,
Abdulrahman S. Al-Batati,
Elgizouli Mohamed,
Noha Taha Elgindi
, et al. (19 additional authors not shown)
Abstract:
As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions (input, response pairs) in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by…
▽ More
As large language models (LLMs) become increasingly integrated into daily life, ensuring their cultural sensitivity and inclusivity is paramount. We introduce our dataset, a year-long community-driven project covering all 22 Arab countries. The dataset includes instructions (input, response pairs) in both Modern Standard Arabic (MSA) and dialectal Arabic (DA), spanning 20 diverse topics. Built by a team of 44 researchers across the Arab world, all of whom are authors of this paper, our dataset offers a broad, inclusive perspective. We use our dataset to evaluate the cultural and dialectal capabilities of several frontier LLMs, revealing notable limitations. For instance, while closed-source LLMs generally exhibit strong performance, they are not without flaws, and smaller open-source models face greater challenges. Moreover, certain countries (e.g., Egypt, the UAE) appear better represented than others (e.g., Iraq, Mauritania, Yemen). Our annotation guidelines, code, and data for reproducibility are publicly available.
△ Less
Submitted 28 February, 2025;
originally announced March 2025.
-
SMART-TRACK: A Novel Kalman Filter-Guided Sensor Fusion For Robust UAV Object Tracking in Dynamic Environments
Authors:
Khaled Gabr,
Mohamed Abdelkader,
Imen Jarraya,
Abdullah AlMusalami,
Anis Koubaa
Abstract:
In the field of sensor fusion and state estimation for object detection and localization, ensuring accurate tracking in dynamic environments poses significant challenges. Traditional methods like the Kalman Filter (KF) often fail when measurements are intermittent, leading to rapid divergence in state estimations. To address this, we introduce SMART (Sensor Measurement Augmentation and Reacquisiti…
▽ More
In the field of sensor fusion and state estimation for object detection and localization, ensuring accurate tracking in dynamic environments poses significant challenges. Traditional methods like the Kalman Filter (KF) often fail when measurements are intermittent, leading to rapid divergence in state estimations. To address this, we introduce SMART (Sensor Measurement Augmentation and Reacquisition Tracker), a novel approach that leverages high-frequency state estimates from the KF to guide the search for new measurements, maintaining tracking continuity even when direct measurements falter. This is crucial for dynamic environments where traditional methods struggle. Our contributions include: 1) Versatile Measurement Augmentation Using KF Feedback: We implement a versatile measurement augmentation system that serves as a backup when primary object detectors fail intermittently. This system is adaptable to various sensors, demonstrated using depth cameras where KF's 3D predictions are projected into 2D depth image coordinates, integrating nonlinear covariance propagation techniques simplified to first-order approximations. 2) Open-source ROS2 Implementation: We provide an open-source ROS2 implementation of the SMART-TRACK framework, validated in a realistic simulation environment using Gazebo and ROS2, fostering broader adaptation and further research. Our results showcase significant enhancements in tracking stability, with estimation RMSE as low as 0.04 m during measurement disruptions, advancing the robustness of UAV tracking and expanding the potential for reliable autonomous UAV operations in complex scenarios. The implementation is available at https://github.com/mzahana/SMART-TRACK.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
TinySiamese Network for Biometric Analysis
Authors:
Islem Jarraya,
Tarek M. Hamdani,
Habib Chabchoub,
Adel M. Alimi
Abstract:
Biometric recognition is the process of verifying or classifying human characteristics in images or videos. It is a complex task that requires machine learning algorithms, including convolutional neural networks (CNNs) and Siamese networks. Besides, there are several limitations to consider when using these algorithms for image verification and classification tasks. In fact, training may be comput…
▽ More
Biometric recognition is the process of verifying or classifying human characteristics in images or videos. It is a complex task that requires machine learning algorithms, including convolutional neural networks (CNNs) and Siamese networks. Besides, there are several limitations to consider when using these algorithms for image verification and classification tasks. In fact, training may be computationally intensive, requiring specialized hardware and significant computational resources to train and deploy. Moreover, it necessitates a large amount of labeled data, which can be time-consuming and costly to obtain. The main advantage of the proposed TinySiamese compared to the standard Siamese is that it does not require the whole CNN for training. In fact, using a pre-trained CNN as a feature extractor and the TinySiamese to learn the extracted features gave almost the same performance and efficiency as the standard Siamese for biometric verification. In this way, the TinySiamese solves the problems of memory and computational time with a small number of layers which did not exceed 7. It can be run under low-power machines which possess a normal GPU and cannot allocate a large RAM space. Using TinySiamese with only 8 GO of memory, the matching time decreased by 76.78% on the B2F (Biometric images of Fingerprints and Faces), FVC2000, FVC2002 and FVC2004 while the training time for 10 epochs went down by approximately 93.14% on the B2F, FVC2002, THDD-part1 and CASIA-B datasets. The accuracy of the fingerprint, gait (NM-angle 180 degree) and face verification tasks was better than the accuracy of a standard Siamese by 0.87%, 20.24% and 3.85% respectively. TinySiamese achieved comparable accuracy with related works for the fingerprint and gait classification tasks.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.