Skip to main content

Showing 1–2 of 2 results for author: Thiombiano, A M O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.01459  [pdf, other

    cs.CL cs.AI cs.LG

    MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling

    Authors: Abdoul Majid O. Thiombiano, Brahim Hnich, Ali Ben Mrad, Mohamed Wiem Mkaouer

    Abstract: This paper introduces MoxE, a novel architecture that synergistically combines the Extended Long Short-Term Memory (xLSTM) with the Mixture of Experts (MoE) framework to address critical scalability and efficiency challenges in large language models (LLMs). The proposed method effectively leverages xLSTM's innovative memory structures while strategically introducing sparsity through MoE to substan… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  2. arXiv:2503.18565  [pdf, other

    cs.LG cs.AI cs.CL

    Distil-xLSTM: Learning Attention Mechanisms through Recurrent Structures

    Authors: Abdoul Majid O. Thiombiano, Brahim Hnich, Ali Ben Mrad, Mohamed Wiem Mkaouer

    Abstract: The current era of Natural Language Processing (NLP) is dominated by Transformer models. However, novel architectures relying on recurrent mechanisms, such as xLSTM and Mamba, have been proposed as alternatives to attention-based models. Although computation is done differently than with the attention mechanism mechanism, these recurrent models yield good results and sometimes even outperform stat… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.