-
Continuum Attention for Neural Operators
Authors:
Edoardo Calvello,
Nikola B. Kovachki,
Matthew E. Levine,
Andrew M. Stuart
Abstract:
Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they a…
▽ More
Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn mappings between function spaces, for which we prove a universal approximation result. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Learning About Structural Errors in Models of Complex Dynamical Systems
Authors:
Jin-Long Wu,
Matthew E. Levine,
Tapio Schneider,
Andrew Stuart
Abstract:
Complex dynamical systems are notoriously difficult to model because some degrees of freedom (e.g., small scales) may be computationally unresolvable or are incompletely understood, yet they are dynamically important. For example, the small scales of cloud dynamics and droplet formation are crucial for controlling climate, yet are unresolvable in global climate models. Semi-empirical closure model…
▽ More
Complex dynamical systems are notoriously difficult to model because some degrees of freedom (e.g., small scales) may be computationally unresolvable or are incompletely understood, yet they are dynamically important. For example, the small scales of cloud dynamics and droplet formation are crucial for controlling climate, yet are unresolvable in global climate models. Semi-empirical closure models for the effects of unresolved degrees of freedom often exist and encode important domain-specific knowledge. Building on such closure models and correcting them through learning the structural errors can be an effective way of fusing data with domain knowledge. Here we describe a general approach, principles, and algorithms for learning about structural errors. Key to our approach is to include structural error models inside the models of complex systems, for example, in closure models for unresolved scales. The structural errors then map, usually nonlinearly, to observable data. As a result, however, mismatches between model output and data are only indirectly informative about structural errors, due to a lack of labeled pairs of inputs and outputs of structural error models. Additionally, derivatives of the model may not exist or be readily available. We discuss how structural error models can be learned from indirect data with derivative-free Kalman inversion algorithms and variants, how sparsity constraints enforce a "do no harm" principle, and various ways of modeling structural errors. We also discuss the merits of using non-local and/or stochastic error models. In addition, we demonstrate how data assimilation techniques can assist the learning about structural errors in non-ergodic systems. The concepts and algorithms are illustrated in two numerical examples based on the Lorenz-96 system and a human glucose-insulin model.
△ Less
Submitted 28 May, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates
Authors:
Ke Alexander Wang,
Matthew E. Levine,
Jiaxin Shi,
Emily B. Fox
Abstract:
Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficu…
▽ More
Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficult to model mechanistically. In this paper, we propose to learn the effects of macronutrition content from glucose-insulin data and meal covariates. Given macronutrition information and meal times, we use a neural network to predict an individual's glucose absorption rate. We use this neural rate function as the control function in a differential equation of glucose dynamics, enabling end-to-end training. On simulated data, our approach is able to closely approximate true absorption rates, resulting in better forecast than heuristic parameterizations, despite only observing glucose, insulin, and macronutritional information. Our work readily generalizes to meal events with higher-dimensional covariates, such as images, setting the stage for glucose dynamics models that are personalized to each individual's daily life.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
A Framework for Machine Learning of Model Error in Dynamical Systems
Authors:
Matthew E. Levine,
Andrew M. Stuart
Abstract:
The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation i…
▽ More
The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless.
First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T, the time-interval over which training data is specified.
Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features.
Numerical results are presented (Lorenz '63, Lorenz '96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less data-hungry and more parametrically efficient. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models.
△ Less
Submitted 17 August, 2022; v1 submitted 14 July, 2021;
originally announced July 2021.
-
Ensemble Kalman Methods With Constraints
Authors:
David J. Albers,
Paul-Adrien Blancquart,
Matthew E. Levine,
Elnaz Esmaeilzadeh Seylabi,
Andrew Stuart
Abstract:
Ensemble Kalman methods constitute an increasingly important tool in both state and parameter estimation problems. Their popularity stems from the derivative-free nature of the methodology which may be readily applied when computer code is available for the underlying state-space dynamics (for state estimation) or for the parameter-to-observable map (for parameter estimation). There are many appli…
▽ More
Ensemble Kalman methods constitute an increasingly important tool in both state and parameter estimation problems. Their popularity stems from the derivative-free nature of the methodology which may be readily applied when computer code is available for the underlying state-space dynamics (for state estimation) or for the parameter-to-observable map (for parameter estimation). There are many applications in which it is desirable to enforce prior information in the form of equality or inequality constraints on the state or parameter. This paper establishes a general framework for doing so, describing a widely applicable methodology, a theory which justifies the methodology, and a set of numerical experiments exemplifying it.
△ Less
Submitted 6 September, 2019; v1 submitted 17 January, 2019;
originally announced January 2019.
-
Offline and online data assimilation for real-time blood glucose forecasting in type 2 diabetes
Authors:
Matthew E Levine,
George Hripcsak,
Lena Mamykina,
Andrew Stuart,
David J Albers
Abstract:
We evaluate the benefits of combining different offline and online data assimilation methodologies to improve personalized blood glucose prediction with type 2 diabetes self-monitoring data. We collect self-monitoring data (nutritional reports and pre- and post-prandial glucose measurements) from 4 individuals with diabetes and 2 individuals without diabetes. We write online to refer to methods th…
▽ More
We evaluate the benefits of combining different offline and online data assimilation methodologies to improve personalized blood glucose prediction with type 2 diabetes self-monitoring data. We collect self-monitoring data (nutritional reports and pre- and post-prandial glucose measurements) from 4 individuals with diabetes and 2 individuals without diabetes. We write online to refer to methods that update state and parameters sequentially as nutrition and glucose data are received, and offline to refer to methods that estimate parameters over a fixed data set, distributed over a time window containing multiple nutrition and glucose measurements.
We fit a model of ultradian glucose dynamics to the first half of each data set using offline (MCMC and nonlinear optimization) and online (unscented Kalman filter and an unfiltered model---a dynamical model driven by nutrition data that does not update states) data assimilation methods. Model parameters estimated over the first half of the data are used within online forecasting methods to issue forecasts over the second half of each data set.
Offline data assimilation methods provided consistent advantages in predictive performance and practical usability in 4 of 6 patient data sets compared to online data assimilation methods alone; yet 2 of 6 patients were best predicted with a strictly online approach. Interestingly, parameter estimates generated offline led to worse predictions when fed to a stochastic filter than when used in a simple, unfiltered model that incorporates new nutritional information, but does not update model states based on glucose measurements.
The relative improvements seen from the unfiltered model, when carefully trained offline, exposes challenges in model sensitivity and filtering applications, but also opens possibilities for improved glucose forecasting and relaxed patient self-monitoring requirements.
△ Less
Submitted 1 September, 2017;
originally announced September 2017.