Skip to main content

Showing 1–24 of 24 results for author: AlKhamissi, B

.
  1. arXiv:2506.13331  [pdf, ps, other

    cs.LG

    Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization

    Authors: Badr AlKhamissi, C. Nicolò De Sabbata, Zeming Chen, Martin Schrimpf, Antoine Bosselut

    Abstract: Human intelligence emerges from the interaction of specialized brain networks, each dedicated to distinct cognitive functions such as language processing, logical reasoning, social understanding, and memory retrieval. Inspired by this biological observation, we introduce the Mixture of Cognitive Reasoners (MiCRo) architecture and training paradigm: a modular transformer-based language model with a… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Preprint. Code, data, and models available at $\href{https://bkhmsi.github.io/mixture-of-cog-reasoners}{\text{this https URL.}}$

  2. arXiv:2503.01830  [pdf, other

    cs.CL

    From Language to Cognition: How LLMs Outgrow the Human Language Network

    Authors: Badr AlKhamissi, Greta Tuckute, Yingtian Tang, Taha Binhuraib, Antoine Bosselut, Martin Schrimpf

    Abstract: Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolution during training as a function of different tasks remain unclear. We here benchmark 34 training checkpoints spanning 300B tokens across 8 different model sizes to analyze how brain alignment relat… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Preprint

  3. arXiv:2411.02280  [pdf, other

    cs.CL cs.LG

    The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units

    Authors: Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

    Abstract: Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature, such as logical reasoning and social inference. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We here ask whether similar specialization for language emerges in LLMs. W… ▽ More

    Submitted 13 February, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: NAACL 2025

  4. arXiv:2411.00828  [pdf, other

    cs.CV cs.LG

    Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data

    Authors: Badr AlKhamissi, Yingtian Tang, Abdülkadir Gökce, Johannes Mehrer, Martin Schrimpf

    Abstract: While today's large language models exhibit impressive abilities in generating human-like text, they require massive amounts of data during training. We here take inspiration from human cognitive development to train models in limited data conditions. Specifically we present a self-synthesis approach that iterates through four phases: Phase 1 sets up fundamental language abilities, training the mo… ▽ More

    Submitted 29 October, 2024; originally announced November 2024.

    Comments: Accepted to BabyLM Challenge at CoNLL 2024

  5. arXiv:2410.11516  [pdf, other

    cs.CL

    TopoLM: brain-like spatio-functional organization in a topographic language model

    Authors: Neil Rathi, Johannes Mehrer, Badr AlKhamissi, Taha Binhuraib, Nicholas M. Blauch, Martin Schrimpf

    Abstract: Neurons in the brain are spatially organized such that neighbors on tissue often exhibit similar response profiles. In the human language system, experimental studies have observed clusters for syntactic and semantic categories, but the mechanisms underlying this functional organization remain unclear. Here, building on work from the vision literature, we develop TopoLM, a transformer language mod… ▽ More

    Submitted 15 May, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

  6. arXiv:2410.03748  [pdf, other

    cs.CL cs.LG

    Khattat: Enhancing Readability and Concept Representation of Semantic Typography

    Authors: Ahmed Hussein, Alaa Elsetohy, Sama Hadhoud, Tameem Bakr, Yasser Rohaim, Badr AlKhamissi

    Abstract: Designing expressive typography that visually conveys a word's meaning while maintaining readability is a complex task, known as semantic typography. It involves selecting an idea, choosing an appropriate font, and balancing creativity with legibility. We introduce an end-to-end system that automates this process. First, a Large Language Model (LLM) generates imagery ideas for the word, useful for… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  7. arXiv:2406.15109  [pdf, other

    cs.CL cs.LG

    Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network

    Authors: Badr AlKhamissi, Greta Tuckute, Antoine Bosselut, Martin Schrimpf

    Abstract: Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprisin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Preprint

  8. arXiv:2403.00180  [pdf, other

    cs.CL

    "Flex Tape Can't Fix That": Bias and Misinformation in Edited Language Models

    Authors: Karina Halevy, Anna Sotnikova, Badr AlKhamissi, Syrielle Montariol, Antoine Bosselut

    Abstract: Model editing has emerged as a cost-effective strategy to update knowledge stored in language models. However, model editing can have unintended consequences after edits are applied: information unrelated to the edits can also be changed, and other general behaviors of the model can be wrongly altered. In this work, we investigate how model editing methods unexpectedly amplify model biases post-ed… ▽ More

    Submitted 3 October, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted to EMNLP 2024 Main. 9 pages, 4 figures

  9. arXiv:2402.13231  [pdf, other

    cs.CL cs.CY

    Investigating Cultural Alignment of Large Language Models

    Authors: Badr AlKhamissi, Muhammad ElNokrashy, Mai AlKhamissi, Mona Diab

    Abstract: The intricate relationship between language and culture has long been a subject of exploration within the realm of linguistic anthropology. Large Language Models (LLMs), promoted as repositories of collective human knowledge, raise a pivotal question: do these models genuinely encapsulate the diverse knowledge adopted by different cultures? Our study reveals that these models demonstrate greater c… ▽ More

    Submitted 6 July, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: ACL 2024 (Main)

  10. arXiv:2401.08919  [pdf, other

    cs.CL cs.LG

    A Context-Contrastive Inference Approach To Partial Diacritization

    Authors: Muhammad ElNokrashy, Badr AlKhamissi

    Abstract: Diacritization plays a pivotal role in improving readability and disambiguating the meaning of Arabic texts. Efforts have so far focused on marking every eligible character (Full Diacritization). Comparatively overlooked, Partial Diacritzation (PD) is the selection of a subset of characters to be marked to aid comprehension where needed. Research has indicated that excessive diacritic marks can hi… ▽ More

    Submitted 9 August, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 14 equations, 5 tables, 5 figures

  11. arXiv:2312.00575  [pdf, other

    cs.CL

    Instruction-tuning Aligns LLMs to the Human Brain

    Authors: Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut

    Abstract: Instruction-tuning is a widely adopted finetuning method that enables large language models (LLMs) to generate output that more closely resembles human responses. However, no studies have shown that instruction-tuning actually teaches LLMs to process language in a similar manner as humans. We investigate the effect of instruction-tuning on aligning LLM and human language processing mechanisms in t… ▽ More

    Submitted 9 August, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: COLM 2024

  12. Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word--Definition Alignment

    Authors: Ahmed ElBakry, Mohamed Gabr, Muhammad ElNokrashy, Badr AlKhamissi

    Abstract: A Reverse Dictionary is a tool enabling users to discover a word based on its provided definition, meaning, or description. Such a technique proves valuable in various scenarios, aiding language learners who possess a description of a word without its identity, and benefiting writers seeking precise terminology. These scenarios often encapsulate what is referred to as the "Tip-of-the-Tongue" (TOT)… ▽ More

    Submitted 21 January, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Proceedings of ArabicNLP 2023

  13. arXiv:2306.16322  [pdf, other

    cs.CL

    Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models

    Authors: Zaid Alyafeai, Maged S. Alshaibani, Badr AlKhamissi, Hamzah Luqman, Ebrahim Alareqi, Ali Fadel

    Abstract: Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3.5 and GPT-4. Despite having a lower training proportion compared to English, these models also exhibit remarkable capabilities in other languages. In this study, we assess the performance of GPT-3.5… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  14. OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

    Authors: Badr AlKhamissi, Siddharth Verma, Ping Yu, Zhijing Jin, Asli Celikyilmaz, Mona Diab

    Abstract: In this paper, we conduct a thorough investigation into the reasoning capabilities of Large Language Models (LLMs), focusing specifically on the Open Pretrained Transformers (OPT) models as a representative of such models. Our study entails finetuning three different sizes of OPT on a carefully curated reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned without explanatio… ▽ More

    Submitted 24 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE) at ACL 2023

  15. arXiv:2212.08286  [pdf, other

    cs.CL

    ALERT: Adapting Language Models to Reasoning Tasks

    Authors: Ping Yu, Tianlu Wang, Olga Golovneva, Badr AlKhamissi, Siddharth Verma, Zhijing Jin, Gargi Ghosh, Mona Diab, Asli Celikyilmaz

    Abstract: Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of their training context, or are they simply memorizing their training corpus at finer granularity and have learnt to better understand their context? To tease apart… ▽ More

    Submitted 7 July, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

  16. arXiv:2209.15168  [pdf, other

    cs.CL cs.LG

    Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification

    Authors: Muhammad ElNokrashy, Badr AlKhamissi, Mona Diab

    Abstract: Language Models pretrained on large textual data have been shown to encode different types of knowledge simultaneously. Traditionally, only the features from the last layer are used when adapting to new tasks or data. We put forward that, when using or finetuning deep pretrained models, intermediate layer features that may be relevant to the downstream task are buried too deep to be used efficient… ▽ More

    Submitted 7 May, 2024; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Accepted Oral Presentation at LREC-COLING 2024; 10 pages, 9 figures

  17. arXiv:2205.12495  [pdf, other

    cs.CL

    ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

    Authors: Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab

    Abstract: Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next. It is also difficult to collect a large-scale hate speech annotated dataset. In this work, we frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts… ▽ More

    Submitted 20 May, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP 2022

    Journal ref: In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2109-2120, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics

  18. arXiv:2205.07960  [pdf, other

    cs.CL

    Meta AI at Arabic Hate Speech 2022: MultiTask Learning with Self-Correction for Hate Speech Classification

    Authors: Badr AlKhamissi, Mona Diab

    Abstract: In this paper, we tackle the Arabic Fine-Grained Hate Speech Detection shared task and demonstrate significant improvements over reported baselines for its three subtasks. The tasks are to predict if a tweet contains (1) Offensive language; and whether it is considered (2) Hate Speech or not and if so, then predict the (3) Fine-Grained Hate Speech label from one of six categories. Our final soluti… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: Accepted at the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5/LREC 2022)

  19. arXiv:2204.06031  [pdf, other

    cs.CL cs.AI

    A Review on Language Models as Knowledge Bases

    Authors: Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad

    Abstract: Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs trained on a sufficiently large (web) corpus will encode a significant amount of knowledge implicitly in its parameters. The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major ad… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: Preprint

  20. arXiv:2112.08360  [pdf, other

    cs.LG cs.AI

    How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy

    Authors: Badr AlKhamissi, Akshay Srinivasan, Zeb-Kurth Nelson, Sam Ritter

    Abstract: Alchemy is a new meta-learning environment rich enough to contain interesting abstractions, yet simple enough to make fine-grained analysis tractable. Further, Alchemy provides an optional symbolic interface that enables meta-RL research without a large compute budget. In this work, we take the first steps toward using Symbolic Alchemy to identify design choices that enable deep-RL agents to learn… ▽ More

    Submitted 25 August, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: Accepted at the AutoML-Conf 2022 Workshop Track

  21. arXiv:2109.08234  [pdf, other

    cs.NE cs.LG

    Deep Spiking Neural Networks with Resonate-and-Fire Neurons

    Authors: Badr AlKhamissi, Muhammad ElNokrashy, David Bernal-Casas

    Abstract: In this work, we explore a new Spiking Neural Network (SNN) formulation with Resonate-and-Fire (RAF) neurons (Izhikevich, 2001) trained with gradient descent via back-propagation. The RAF-SNN, while more biologically plausible, achieves performance comparable to or higher than conventional models in the Machine Learning literature across different network configurations, using similar or fewer par… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: Preprint

  22. arXiv:2104.02959  [pdf, other

    cs.LG cs.AI

    The Emergence of Abstract and Episodic Neurons in Episodic Meta-RL

    Authors: Badr AlKhamissi, Muhammad ElNokrashy, Michael Spranger

    Abstract: In this work, we analyze the reinstatement mechanism introduced by Ritter et al. (2018) to reveal two classes of neurons that emerge in the agent's working memory (an epLSTM cell) when trained using episodic meta-RL on an episodic variant of the Harlow visual fixation task. Specifically, Abstract neurons encode knowledge shared across tasks, while Episodic neurons carry information relevant for a… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: This work was accepted at the Learning to Learn Workshop (ICLR 2021)

  23. arXiv:2103.01065  [pdf, other

    cs.CL

    Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

    Authors: Badr AlKhamissi, Mohamed Gabr, Muhammad ElNokrashy, Khaled Essam

    Abstract: In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province. Our final model is an ensemble of variants built on top of M… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: This work was accepted at the Sixth Arabic Natural Language Processing Workshop (EACL/WANLP 2021)

  24. arXiv:2011.00538  [pdf, other

    cs.CL cs.LG

    Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization

    Authors: Badr AlKhamissi, Muhammad N. ElNokrashy, Mohamed Gabr

    Abstract: We propose a novel architecture for labelling character sequences that achieves state-of-the-art results on the Tashkeela Arabic diacritization benchmark. The core is a two-level recurrence hierarchy that operates on the word and character levels separately---enabling faster training and inference than comparable traditional models. A cross-level attention module further connects the two, and open… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: This work was accepted at the Fifth Arabic Natural Language Processing Workshop (COLING/WANLP 2020)