Skip to main content

Showing 1–1 of 1 results for author: Lal, V

Searching in archive stat. Search in all archives.
.
  1. arXiv:2412.07446  [pdf, ps, other

    cs.AI cs.CL cs.LG stat.ML

    A Causal World Model Underlying Next Token Prediction: Exploring GPT in a Controlled Environment

    Authors: Raanan Y. Rohekar, Yaniv Gurwicz, Sungduk Yu, Estelle Aflalo, Vasudev Lal

    Abstract: Are generative pre-trained transformer (GPT) models, trained only to predict the next token, implicitly learning a world model from which sequences are generated one token at a time? We address this question by deriving a causal interpretation of the attention mechanism in GPT and presenting a causal world model that arises from this interpretation. Furthermore, we propose that GPT models, at infe… ▽ More

    Submitted 6 July, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

    Comments: International Conference on Machine Learning (ICML), 2025