-
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Authors:
Siavash Golkar,
Alberto Bietti,
Mariel Pettee,
Michael Eickenberg,
Miles Cranmer,
Keiya Hirashima,
Geraud Krawezik,
Nicholas Lourie,
Michael McCabe,
Rudy Morel,
Ruben Ohana,
Liam Holden Parker,
Bruno Régaldo-Saint Blancard,
Kyunghyun Cho,
Shirley Ho
Abstract:
Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datas…
▽ More
Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datasets, akin to object detection or region-based scientific analysis. We present theoretical and empirical analysis using both causal and non-causal Transformer architectures, investigating the influence of various positional encodings on performance and interpretability. In particular, we find that causal attention is much better suited for the task, and that no positional embeddings lead to the best accuracy, though rotary embeddings are competitive and easier to train. We also show that out of distribution performance is tightly linked to which tokens it uses as a bias term.
△ Less
Submitted 30 May, 2024;
originally announced June 2024.
-
Multiple Physics Pretraining for Physical Surrogate Models
Authors:
Michael McCabe,
Bruno Régaldo-Saint Blancard,
Liam Holden Parker,
Ruben Ohana,
Miles Cranmer,
Alberto Bietti,
Michael Eickenberg,
Siavash Golkar,
Geraud Krawezik,
Francois Lanusse,
Mariel Pettee,
Tiberiu Tesileanu,
Kyunghyun Cho,
Shirley Ho
Abstract:
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly…
▽ More
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.
△ Less
Submitted 10 December, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
xVal: A Continuous Numerical Tokenization for Scientific Language Models
Authors:
Siavash Golkar,
Mariel Pettee,
Michael Eickenberg,
Alberto Bietti,
Miles Cranmer,
Geraud Krawezik,
Francois Lanusse,
Michael McCabe,
Ruben Ohana,
Liam Parker,
Bruno Régaldo-Saint Blancard,
Tiberiu Tesileanu,
Kyunghyun Cho,
Shirley Ho
Abstract:
Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science…
▽ More
Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science. In this work, we introduce xVal, a strategy for continuously tokenizing numbers within language models that results in a more appropriate inductive bias for scientific applications. By training specially-modified language models from scratch on a variety of scientific datasets formatted as text, we find that xVal generally outperforms other common numerical tokenization strategies on metrics including out-of-distribution generalization and computational efficiency.
△ Less
Submitted 15 December, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.