-
ALLaM: Large Language Models for Arabic and English
Authors:
M Saiful Bari,
Yazeed Alnumay,
Norah A. Alzahrani,
Nouf M. Alotaibi,
Hisham A. Alyahya,
Sultan AlRashed,
Faisal A. Mirza,
Shaykhah Z. Alsubaie,
Hassan A. Alahmed,
Ghadah Alabduljabbar,
Raghad Alkhathran,
Yousef Almushayqih,
Raneem Alnajim,
Salman Alsubaihi,
Maryam Al Mansour,
Majed Alrubaian,
Ali Alammari,
Zaki Alawami,
Abdulmohsen Al-Thubaity,
Ahmed Abdelali,
Jeril Kuriakose,
Abdalghani Abujabal,
Nora Al-Twairesh,
Areeb Alowisheq,
Haidar Khan
Abstract:
We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT). ALLaM is carefully trained considering the values of language alignment and knowledge transfer at scale. Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining on a mixture…
▽ More
We present ALLaM: Arabic Large Language Model, a series of large language models to support the ecosystem of Arabic Language Technologies (ALT). ALLaM is carefully trained considering the values of language alignment and knowledge transfer at scale. Our autoregressive decoder-only architecture models demonstrate how second-language acquisition via vocabulary expansion and pretraining on a mixture of Arabic and English text can steer a model towards a new language (Arabic) without any catastrophic forgetting in the original language (English). Furthermore, we highlight the effectiveness of using parallel/translated data to aid the process of knowledge alignment between languages. Finally, we show that extensive alignment with human preferences can significantly enhance the performance of a language model compared to models of a larger scale with lower quality alignment. ALLaM achieves state-of-the-art performance in various Arabic benchmarks, including MMLU Arabic, ACVA, and Arabic Exams. Our aligned models improve both in Arabic and English from their base aligned models.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
On Robust Learning from Noisy Labels: A Permutation Layer Approach
Authors:
Salman Alsubaihi,
Mohammed Alkhrashi,
Raied Aljadaany,
Fahad Albalawi,
Bernard Ghanem
Abstract:
The existence of label noise imposes significant challenges (e.g., poor generalization) on the training process of deep neural networks (DNN). As a remedy, this paper introduces a permutation layer learning approach termed PermLL to dynamically calibrate the training process of the DNN subject to instance-dependent and instance-independent label noise. The proposed method augments the architecture…
▽ More
The existence of label noise imposes significant challenges (e.g., poor generalization) on the training process of deep neural networks (DNN). As a remedy, this paper introduces a permutation layer learning approach termed PermLL to dynamically calibrate the training process of the DNN subject to instance-dependent and instance-independent label noise. The proposed method augments the architecture of a conventional DNN by an instance-dependent permutation layer. This layer is essentially a convex combination of permutation matrices that is dynamically calibrated for each sample. The primary objective of the permutation layer is to correct the loss of noisy samples mitigating the effect of label noise. We provide two variants of PermLL in this paper: one applies the permutation layer to the model's prediction, while the other applies it directly to the given noisy label. In addition, we provide a theoretical comparison between the two variants and show that previous methods can be seen as one of the variants. Finally, we validate PermLL experimentally and show that it achieves state-of-the-art performance on both real and synthetic datasets.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Network Moments: Extensions and Sparse-Smooth Attacks
Authors:
Modar Alfadly,
Adel Bibi,
Emilio Botero,
Salman Alsubaihi,
Bernard Ghanem
Abstract:
The impressive performance of deep neural networks (DNNs) has immensely strengthened the line of research that aims at theoretically analyzing their effectiveness. This has incited research on the reaction of DNNs to noisy input, namely developing adversarial input attacks and strategies that lead to robust DNNs to these attacks. To that end, in this paper, we derive exact analytic expressions for…
▽ More
The impressive performance of deep neural networks (DNNs) has immensely strengthened the line of research that aims at theoretically analyzing their effectiveness. This has incited research on the reaction of DNNs to noisy input, namely developing adversarial input attacks and strategies that lead to robust DNNs to these attacks. To that end, in this paper, we derive exact analytic expressions for the first and second moments (mean and variance) of a small piecewise linear (PL) network (Affine, ReLU, Affine) subject to Gaussian input. In particular, we generalize the second-moment expression of Bibi et al. to arbitrary input Gaussian distributions, dropping the zero-mean assumption. We show that the new variance expression can be efficiently approximated leading to much tighter variance estimates as compared to the preliminary results of Bibi et al. Moreover, we experimentally show that these expressions are tight under simple linearizations of deeper PL-DNNs, where we investigate the effect of the linearization sensitivity on the accuracy of the moment estimates. Lastly, we show that the derived expressions can be used to construct sparse and smooth Gaussian adversarial attacks (targeted and non-targeted) that tend to lead to perceptually feasible input attacks.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Expected Tight Bounds for Robust Training
Authors:
Salman Alsubaihi,
Adel Bibi,
Modar Alfadly,
Abdullah Hamdi,
Bernard Ghanem
Abstract:
Training Deep Neural Networks that are robust to norm bounded adversarial attacks remains an elusive problem. While exact and inexact verification-based methods are generally too expensive to train large networks, it was demonstrated that bounded input intervals can be inexpensively propagated from a layer to another through deep networks. This interval bound propagation approach (IBP) not only ha…
▽ More
Training Deep Neural Networks that are robust to norm bounded adversarial attacks remains an elusive problem. While exact and inexact verification-based methods are generally too expensive to train large networks, it was demonstrated that bounded input intervals can be inexpensively propagated from a layer to another through deep networks. This interval bound propagation approach (IBP) not only has improved both robustness and certified accuracy but was the first to be employed on large/deep networks. However, due to the very loose nature of the IBP bounds, the required training procedure is complex and involved. In this paper, we closely examine the bounds of a block of layers composed in the form of Affine-ReLU-Affine. To this end, we propose expected tight bounds (true bounds in expectation), referred to as ETB, which are provably tighter than IBP bounds in expectation. We then extend this result to deeper networks through blockwise propagation and show that we can achieve orders of magnitudes tighter bounds compared to IBP. Furthermore, using a simple standard training procedure, we can achieve impressive robustness-accuracy trade-off on both MNIST and CIFAR10.
△ Less
Submitted 12 June, 2021; v1 submitted 28 May, 2019;
originally announced May 2019.