-
Controllable Patching for Compute-Adaptive Surrogate Modeling of Partial Differential Equations
Authors:
Payel Mukhopadhyay,
Michael McCabe,
Ruben Ohana,
Miles Cranmer
Abstract:
Patch-based transformer surrogates have become increasingly effective for modeling spatiotemporal dynamics, but the fixed patch size is a major limitation for budget-conscience deployment in production. We introduce two lightweight, architecture-agnostic modules-the Convolutional Kernel Modulator (CKM) and Convolutional Stride Modulator (CSM)-that enable dynamic patch size control at inference in…
▽ More
Patch-based transformer surrogates have become increasingly effective for modeling spatiotemporal dynamics, but the fixed patch size is a major limitation for budget-conscience deployment in production. We introduce two lightweight, architecture-agnostic modules-the Convolutional Kernel Modulator (CKM) and Convolutional Stride Modulator (CSM)-that enable dynamic patch size control at inference in patch based models, without retraining or accuracy loss. Combined with a cyclic patch-size rollout, our method mitigates patch artifacts and improves long-term stability for video-like prediction tasks. Applied to a range of challenging 2D and 3D PDE benchmarks, our approach improves rollout fidelity and runtime efficiency. To our knowledge, this is the first framework to enable inference-time patch-size tunability in patch-based PDE surrogates. Its plug-and-play design makes it broadly applicable across architectures-establishing a general foundation for compute-adaptive modeling in PDE surrogate tasks.
△ Less
Submitted 12 July, 2025;
originally announced July 2025.
-
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
Authors:
François Rozet,
Ruben Ohana,
Michael McCabe,
Gilles Louppe,
François Lanusse,
Shirley Ho
Abstract:
The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been addressed by generating in the latent space of an autoencoder instead of the pixel space. In this work, we investigate whether a similar strategy can be effectively applied to the emulation of dynamical systems…
▽ More
The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been addressed by generating in the latent space of an autoencoder instead of the pixel space. In this work, we investigate whether a similar strategy can be effectively applied to the emulation of dynamical systems and at what cost. We find that the accuracy of latent-space emulation is surprisingly robust to a wide range of compression rates (up to 1000x). We also show that diffusion-based emulators are consistently more accurate than non-generative counterparts and compensate for uncertainty in their predictions with greater diversity. Finally, we cover practical design choices, spanning from architectures to optimizers, that we found critical to train latent-space emulators.
△ Less
Submitted 25 September, 2025; v1 submitted 3 July, 2025;
originally announced July 2025.
-
The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning
Authors:
Ruben Ohana,
Michael McCabe,
Lucas Meyer,
Rudy Morel,
Fruzsina J. Agocs,
Miguel Beneitez,
Marsha Berger,
Blakesley Burkhart,
Keaton Burns,
Stuart B. Dalziel,
Drummond B. Fielding,
Daniel Fortunato,
Jared A. Goldberg,
Keiya Hirashima,
Yan-Fei Jiang,
Rich R. Kerswell,
Suryanarayana Maddu,
Jonah Miller,
Payel Mukhopadhyay,
Stefan S. Nixon,
Jeff Shen,
Romain Watteaux,
Bruno Régaldo-Saint Blancard,
François Rozet,
Liam H. Parker
, et al. (2 additional authors not shown)
Abstract:
Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide va…
▽ More
Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.
△ Less
Submitted 21 February, 2025; v1 submitted 30 November, 2024;
originally announced December 2024.
-
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Authors:
Siavash Golkar,
Alberto Bietti,
Mariel Pettee,
Michael Eickenberg,
Miles Cranmer,
Keiya Hirashima,
Geraud Krawezik,
Nicholas Lourie,
Michael McCabe,
Rudy Morel,
Ruben Ohana,
Liam Holden Parker,
Bruno Régaldo-Saint Blancard,
Kyunghyun Cho,
Shirley Ho
Abstract:
Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datas…
▽ More
Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datasets, akin to object detection or region-based scientific analysis. We present theoretical and empirical analysis using both causal and non-causal Transformer architectures, investigating the influence of various positional encodings on performance and interpretability. In particular, we find that causal attention is much better suited for the task, and that no positional embeddings lead to the best accuracy, though rotary embeddings are competitive and easier to train. We also show that out of distribution performance is tightly linked to which tokens it uses as a bias term.
△ Less
Submitted 30 May, 2024;
originally announced June 2024.
-
AstroCLIP: A Cross-Modal Foundation Model for Galaxies
Authors:
Liam Parker,
Francois Lanusse,
Siavash Golkar,
Leopoldo Sarra,
Miles Cranmer,
Alberto Bietti,
Michael Eickenberg,
Geraud Krawezik,
Michael McCabe,
Ruben Ohana,
Mariel Pettee,
Bruno Regaldo-Saint Blancard,
Tiberiu Tesileanu,
Kyunghyun Cho,
Shirley Ho
Abstract:
We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used - without any model fine-tuning - for a variety of downstream tasks including (1) accurate in-modality and cross-modality semantic similarity search, (2) photometric redshift estimation, (3) galaxy property estimation fro…
▽ More
We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used - without any model fine-tuning - for a variety of downstream tasks including (1) accurate in-modality and cross-modality semantic similarity search, (2) photometric redshift estimation, (3) galaxy property estimation from both images and spectra, and (4) morphology classification. Our approach to implementing AstroCLIP consists of two parts. First, we embed galaxy images and spectra separately by pretraining separate transformer-based image and spectrum encoders in self-supervised settings. We then align the encoders using a contrastive loss. We apply our method to spectra from the Dark Energy Spectroscopic Instrument and images from its corresponding Legacy Imaging Survey. Overall, we find remarkable performance on all downstream tasks, even relative to supervised baselines. For example, for a task like photometric redshift prediction, we find similar performance to a specifically-trained ResNet18, and for additional tasks like physical property estimation (stellar mass, age, metallicity, and sSFR), we beat this supervised baseline by 19\% in terms of $R^2$. We also compare our results to a state-of-the-art self-supervised single-modal model for galaxy images, and find that our approach outperforms this benchmark by roughly a factor of two on photometric redshift estimation and physical property prediction in terms of $R^2$, while remaining roughly in-line in terms of morphology classification. Ultimately, our approach represents the first cross-modal self-supervised model for galaxies, and the first self-supervised transformer-based architectures for galaxy images and spectra.
△ Less
Submitted 14 June, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Multiple Physics Pretraining for Physical Surrogate Models
Authors:
Michael McCabe,
Bruno Régaldo-Saint Blancard,
Liam Holden Parker,
Ruben Ohana,
Miles Cranmer,
Alberto Bietti,
Michael Eickenberg,
Siavash Golkar,
Geraud Krawezik,
Francois Lanusse,
Mariel Pettee,
Tiberiu Tesileanu,
Kyunghyun Cho,
Shirley Ho
Abstract:
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly…
▽ More
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.
△ Less
Submitted 10 December, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
xVal: A Continuous Numerical Tokenization for Scientific Language Models
Authors:
Siavash Golkar,
Mariel Pettee,
Michael Eickenberg,
Alberto Bietti,
Miles Cranmer,
Geraud Krawezik,
Francois Lanusse,
Michael McCabe,
Ruben Ohana,
Liam Parker,
Bruno Régaldo-Saint Blancard,
Tiberiu Tesileanu,
Kyunghyun Cho,
Shirley Ho
Abstract:
Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science…
▽ More
Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science. In this work, we introduce xVal, a strategy for continuously tokenizing numbers within language models that results in a more appropriate inductive bias for scientific applications. By training specially-modified language models from scratch on a variety of scientific datasets formatted as text, we find that xVal generally outperforms other common numerical tokenization strategies on metrics including out-of-distribution generalization and computational efficiency.
△ Less
Submitted 15 December, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Towards Stability of Autoregressive Neural Operators
Authors:
Michael McCabe,
Peter Harrington,
Shashank Subramanian,
Jed Brown
Abstract:
Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effe…
▽ More
Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to neural operators leads to significantly lower errors for long-term forecasts as well as longer time horizons without qualitative signs of divergence compared to the original models for these systems. We open-source our \href{https://github.com/mikemccabe210/stabilizing_neural_operators}{code} for reproducibility.
△ Less
Submitted 10 December, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Learning to Assimilate in Chaotic Dynamical Systems
Authors:
Michael McCabe,
Jed Brown
Abstract:
The accuracy of simulation-based forecasting in chaotic systems is heavily dependent on high-quality estimates of the system state at the time the forecast is initialized. Data assimilation methods are used to infer these initial conditions by systematically combining noisy, incomplete observations and numerical models of system dynamics to produce effective estimation schemes. We introduce amorti…
▽ More
The accuracy of simulation-based forecasting in chaotic systems is heavily dependent on high-quality estimates of the system state at the time the forecast is initialized. Data assimilation methods are used to infer these initial conditions by systematically combining noisy, incomplete observations and numerical models of system dynamics to produce effective estimation schemes. We introduce amortized assimilation, a framework for learning to assimilate in dynamical systems from sequences of noisy observations with no need for ground truth data. We motivate the framework by extending powerful results from self-supervised denoising to the dynamical systems setting through the use of differentiable simulation. Experimental results across several benchmark systems highlight the improved effectiveness of our approach over widely-used data assimilation methods.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Mapper Comparison with Wasserstein Metrics
Authors:
Michael McCabe
Abstract:
The challenge of describing model drift is an open question in unsupervised learning. It can be difficult to evaluate at what point an unsupervised model has deviated beyond what would be expected from a different sample from the same population. This is particularly true for models without a probabilistic interpretation. One such family of techniques, Topological Data Analysis, and the Mapper alg…
▽ More
The challenge of describing model drift is an open question in unsupervised learning. It can be difficult to evaluate at what point an unsupervised model has deviated beyond what would be expected from a different sample from the same population. This is particularly true for models without a probabilistic interpretation. One such family of techniques, Topological Data Analysis, and the Mapper algorithm in particular, has found use in a variety of fields, but describing model drift for Mapper graphs is an understudied area as even existing techniques for measuring distances between related constructs like graphs or simplicial complexes fail to account for the fact that Mapper graphs represent a combination of topological, metric, and density information. In this paper, we develop an optimal transport based metric which we call the Network Augmented Wasserstein Distance for evaluating distances between Mapper graphs and demonstrate the value of the metric for model drift analysis by using the metric to transform the model drift problem into an anomaly detection problem over dynamic graphs.
△ Less
Submitted 14 December, 2018;
originally announced December 2018.