-
Constraining Chaos: Enforcing dynamical invariants in the training of recurrent neural networks
Authors:
Jason A. Platt,
Stephen G. Penny,
Timothy A. Smith,
Tse-Chun Chen,
Henry D. I. Abarbanel
Abstract:
Drawing on ergodic theory, we introduce a novel training method for machine learning based forecasting methods for chaotic dynamical systems. The training enforces dynamical invariants--such as the Lyapunov exponent spectrum and fractal dimension--in the systems of interest, enabling longer and more stable forecasts when operating with limited data. The technique is demonstrated in detail using th…
▽ More
Drawing on ergodic theory, we introduce a novel training method for machine learning based forecasting methods for chaotic dynamical systems. The training enforces dynamical invariants--such as the Lyapunov exponent spectrum and fractal dimension--in the systems of interest, enabling longer and more stable forecasts when operating with limited data. The technique is demonstrated in detail using the recurrent neural network architecture of reservoir computing. Results are given for the Lorenz 1996 chaotic dynamical system and a spectral quasi-geostrophic model, both typical test cases for numerical weather prediction.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
A Systematic Exploration of Reservoir Computing for Forecasting Complex Spatiotemporal Dynamics
Authors:
Jason A. Platt,
Stephen G. Penny,
Timothy A. Smith,
Tse-Chun Chen,
Henry D. I. Abarbanel
Abstract:
A reservoir computer (RC) is a type of simplified recurrent neural network architecture that has demonstrated success in the prediction of spatiotemporally chaotic dynamical systems. A further advantage of RC is that it reproduces intrinsic dynamical quantities essential for its incorporation into numerical forecasting routines such as the ensemble Kalman filter -- used in numerical weather predic…
▽ More
A reservoir computer (RC) is a type of simplified recurrent neural network architecture that has demonstrated success in the prediction of spatiotemporally chaotic dynamical systems. A further advantage of RC is that it reproduces intrinsic dynamical quantities essential for its incorporation into numerical forecasting routines such as the ensemble Kalman filter -- used in numerical weather prediction to compensate for sparse and noisy data. We explore here the architecture and design choices for a "best in class" RC for a number of characteristic dynamical systems, and then show the application of these choices in scaling up to larger models using localization. Our analysis points to the importance of large scale parameter optimization. We also note in particular the importance of including input bias in the RC design, which has a significant impact on the forecast skill of the trained RC model. In our tests, the the use of a nonlinear readout operator does not affect the forecast time or the stability of the forecast. The effects of the reservoir dimension, spinup time, amount of training data, normalization, noise, and the RC time step are also investigated. While we are not aware of a generally accepted best reported mean forecast time for different models in the literature, we report over a factor of 2 increase in the mean forecast time compared to the best performing RC model of Vlachas et.al (2020) for the 40 dimensional spatiotemporally chaotic Lorenz 1996 dynamics, and we are able to accomplish this using a smaller reservoir size.
△ Less
Submitted 21 January, 2022;
originally announced January 2022.
-
Integrating Recurrent Neural Networks with Data Assimilation for Scalable Data-Driven State Estimation
Authors:
Stephen G. Penny,
Timothy A. Smith,
Tse-Chun Chen,
Jason A. Platt,
Hsin-Yi Lin,
Michael Goodliff,
Henry D. I. Abarbanel
Abstract:
Data assimilation (DA) is integrated with machine learning in order to perform entirely data-driven online state estimation. To achieve this, recurrent neural networks (RNNs) are implemented as surrogate models to replace key components of the DA cycle in numerical weather prediction (NWP), including the conventional numerical forecast model, the forecast error covariance matrix, and the tangent l…
▽ More
Data assimilation (DA) is integrated with machine learning in order to perform entirely data-driven online state estimation. To achieve this, recurrent neural networks (RNNs) are implemented as surrogate models to replace key components of the DA cycle in numerical weather prediction (NWP), including the conventional numerical forecast model, the forecast error covariance matrix, and the tangent linear and adjoint models. It is shown how these RNNs can be initialized using DA methods to directly update the hidden/reservoir state with observations of the target system. The results indicate that these techniques can be applied to estimate the state of a system for the repeated initialization of short-term forecasts, even in the absence of a traditional numerical forecast model. Further, it is demonstrated how these integrated RNN-DA methods can scale to higher dimensions by applying domain localization and parallelization, providing a path for practical applications in NWP.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
Forecasting Using Reservoir Computing: The Role of Generalized Synchronization
Authors:
Jason A. Platt,
Adrian Wong,
Randall Clark,
Stephen G. Penny,
Henry D. I. Abarbanel
Abstract:
Reservoir computers (RC) are a form of recurrent neural network (RNN) used for forecasting time series data. As with all RNNs, selecting the hyperparameters presents a challenge when training on new inputs. We present a method based on generalized synchronization (GS) that gives direction in designing and evaluating the architecture and hyperparameters of a RC. The 'auxiliary method' for detecting…
▽ More
Reservoir computers (RC) are a form of recurrent neural network (RNN) used for forecasting time series data. As with all RNNs, selecting the hyperparameters presents a challenge when training on new inputs. We present a method based on generalized synchronization (GS) that gives direction in designing and evaluating the architecture and hyperparameters of a RC. The 'auxiliary method' for detecting GS provides a pre-training test that guides hyperparameter selection. Furthermore, we provide a metric for a "well trained" RC using the reproduction of the input system's Lyapunov exponents.
△ Less
Submitted 14 April, 2021; v1 submitted 3 February, 2021;
originally announced February 2021.
-
Machine Learning Classification Informed by a Functional Biophysical System
Authors:
Jason A. Platt,
Anna Miller,
Lawson Fuller,
Henry D. I. Abarbanel
Abstract:
We present a novel machine learning architecture for classification suggested by experiments on olfactory systems. The network separates input stimuli, represented as spatially distinct currents, via winnerless competition---a process based on the intrinsic sequential dynamics of the neural system---then uses a support vector machine (SVM) to provide precision to the space-time separation of the o…
▽ More
We present a novel machine learning architecture for classification suggested by experiments on olfactory systems. The network separates input stimuli, represented as spatially distinct currents, via winnerless competition---a process based on the intrinsic sequential dynamics of the neural system---then uses a support vector machine (SVM) to provide precision to the space-time separation of the output. The combined network uses biophysical models of neurons and shows high discrimination among inputs and robustness to noise. While using the SVM alone does not permit determination of the components of mixtures of classified inputs, the combined network is able to tell the precise concentrations of the constituent parts.
△ Less
Submitted 16 June, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Precision annealing Monte Carlo methods for statistical data assimilation and machine learning
Authors:
Zheng Fang,
Adrian S. Wong,
Kangbo Hao,
Alexander J. A. Ty,
Henry D. I. Abarbanel
Abstract:
In statistical data assimilation (SDA) and supervised machine learning (ML), we wish to transfer information from observations to a model of the processes underlying those observations. For SDA, the model consists of a set of differential equations that describe the dynamics of a physical system. For ML, the model is usually constructed using other strategies. In this paper, we develop a systemati…
▽ More
In statistical data assimilation (SDA) and supervised machine learning (ML), we wish to transfer information from observations to a model of the processes underlying those observations. For SDA, the model consists of a set of differential equations that describe the dynamics of a physical system. For ML, the model is usually constructed using other strategies. In this paper, we develop a systematic formulation based on Monte Carlo sampling to achieve such information transfer. Following the derivation of an appropriate target distribution, we present the formulation based on the standard Metropolis-Hasting (MH) procedure and the Hamiltonian Monte Carlo (HMC) method for performing the high dimensional integrals that appear. To the extensive literature on MH and HMC, we add (1) an annealing method using a hyperparameter that governs the precision of the model to identify and explore the highest probability regions of phase space dominating those integrals, and (2) a strategy for initializing the state space search. The efficacy of the proposed formulation is demonstrated using a nonlinear dynamical model with chaotic solutions widely used in geophysics.
△ Less
Submitted 21 January, 2020; v1 submitted 6 July, 2019;
originally announced July 2019.
-
Machine Learning of Time Series Using Time-delay Embedding and Precision Annealing
Authors:
Alexander J. A. Ty,
Zheng Fang,
Rivver A. Gonzalez,
Paul J. Rozdeba,
Henry D. I. Abarbanel
Abstract:
Tasking machine learning to predict segments of a time series requires estimating the parameters of a ML model with input/output pairs from the time series. Using the equivalence between statistical data assimilation and supervised machine learning, we revisit this task. The training method for the machine utilizes a precision annealing approach to identifying the global minimum of the action (-lo…
▽ More
Tasking machine learning to predict segments of a time series requires estimating the parameters of a ML model with input/output pairs from the time series. Using the equivalence between statistical data assimilation and supervised machine learning, we revisit this task. The training method for the machine utilizes a precision annealing approach to identifying the global minimum of the action (-log[P]). In this way we are able to identify the number of training pairs required to produce good generalizations (predictions) for the time series. We proceed from a scalar time series $s(t_n); t_n = t_0 + n Δt$ and using methods of nonlinear time series analysis show how to produce a $D_E > 1$ dimensional time delay embedding space in which the time series has no false neighbors as does the observed $s(t_n)$ time series. In that $D_E$-dimensional space we explore the use of feed forward multi-layer perceptrons as network models operating on $D_E$-dimensional input and producing $D_E$-dimensional outputs.
△ Less
Submitted 14 June, 2019; v1 submitted 12 February, 2019;
originally announced February 2019.
-
Machine Learning as Statistical Data Assimilation
Authors:
H. D. I. Abarbanel,
P. J. Rozdeba,
S. Shirman
Abstract:
We identify a strong equivalence between neural network based machine learning (ML) methods and the formulation of statistical data assimilation (DA), known to be a problem in statistical physics. DA, as used widely in physical and biological sciences, systematically transfers information in observations to a model of the processes producing the observations. The correspondence is that layer label…
▽ More
We identify a strong equivalence between neural network based machine learning (ML) methods and the formulation of statistical data assimilation (DA), known to be a problem in statistical physics. DA, as used widely in physical and biological sciences, systematically transfers information in observations to a model of the processes producing the observations. The correspondence is that layer label in the ML setting is the analog of time in the data assimilation setting. Utilizing aspects of this equivalence we discuss how to establish the global minimum of the cost functions in the ML context, using a variational annealing method from DA. This provides a design method for optimal networks for ML applications and may serve as the basis for understanding the success of "deep learning". Results from an ML example are presented.
When the layer label is taken to be continuous, the Euler-Lagrange equation for the ML optimization problem is an ordinary differential equation, and we see that the problem being solved is a two point boundary value problem. The use of continuous layers is denoted "deepest learning". The Hamiltonian version provides a direct rationale for back propagation as a solution method for the canonical momentum; however, it suggests other solution methods are to be preferred.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.