Skip to main content

Showing 1–2 of 2 results for author: Musat, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.12118  [pdf, other

    cs.LG cs.CL

    Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers

    Authors: Tiberiu Musat

    Abstract: In this paper, I introduce the retrieval problem, a simple yet common reasoning task that can be solved only by transformers with a minimum number of layers, which grows logarithmically with the input size. I empirically show that large language models can solve the task under different prompting formulations without any fine-tuning. To understand how transformers solve the retrieval problem, I tr… ▽ More

    Submitted 29 March, 2025; v1 submitted 18 November, 2024; originally announced November 2024.

  2. arXiv:2408.09414  [pdf, other

    cs.LG

    Clustering and Alignment: Understanding the Training Dynamics in Modular Addition

    Authors: Tiberiu Musat

    Abstract: Recent studies have revealed that neural networks learn interpretable algorithms for many simple problems. However, little is known about how these algorithms emerge during training. In this article, I study the training dynamics of a small neural network with 2-dimensional embeddings on the problem of modular addition. I observe that embedding vectors tend to organize into two types of structures… ▽ More

    Submitted 27 October, 2024; v1 submitted 18 August, 2024; originally announced August 2024.