Search | arXiv e-print repository

Challenges of learning multi-scale dynamics with AI weather models: Implications for stability and one solution

Authors: Ashesh Chattopadhyay, Y. Qiang Sun, Pedram Hassanzadeh

Abstract: Long-term stability and physical consistency are critical properties for AI-based weather models if they are going to be used for subseasonal-to-seasonal forecasts or beyond, e.g., climate change projection. However, current AI-based weather models can only provide short-term forecasts accurately since they become unstable or physically inconsistent when time-integrated beyond a few weeks or a few… ▽ More Long-term stability and physical consistency are critical properties for AI-based weather models if they are going to be used for subseasonal-to-seasonal forecasts or beyond, e.g., climate change projection. However, current AI-based weather models can only provide short-term forecasts accurately since they become unstable or physically inconsistent when time-integrated beyond a few weeks or a few months. Either they exhibit numerical blow-up or hallucinate unrealistic dynamics of the atmospheric variables, akin to the current class of autoregressive large language models. The cause of the instabilities is unknown, and the methods that are used to improve their stability horizons are ad-hoc and lack rigorous theory. In this paper, we reveal that the universal causal mechanism for these instabilities in any turbulent flow is due to \textit{spectral bias} wherein, \textit{any} deep learning architecture is biased to learn only the large-scale dynamics and ignores the small scales completely. We further elucidate how turbulence physics and the absence of convergence in deep learning-based time-integrators amplify this bias, leading to unstable error propagation. Finally, using the quasi-geostrophic flow and European Center for Medium-Range Weather Forecasting (ECMWF) Reanalysis data as test cases, we bridge the gap between deep learning theory and numerical analysis to propose one mitigative solution to such unphysical behavior. We develop long-term physically-consistent data-driven models for the climate system and demonstrate accurate short-term forecasts, and hundreds of years of time-integration with accurate mean and variability. △ Less

Submitted 7 December, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Supplementary information is given at https://drive.google.com/file/d/1xMPlC5z4kqc7ZrYY--Be6Dzqo_7xOpyi/view?usp=drive_link

arXiv:2301.03758 [pdf, other]

Sequential Fair Resource Allocation under a Markov Decision Process Framework

Authors: Parisa Hassanzadeh, Eleonora Kreacic, Sihan Zeng, Yuchen Xiao, Sumitra Ganesh

Abstract: We study the sequential decision-making problem of allocating a limited resource to agents that reveal their stochastic demands on arrival over a finite horizon. Our goal is to design fair allocation algorithms that exhaust the available resource budget. This is challenging in sequential settings where information on future demands is not available at the time of decision-making. We formulate the… ▽ More We study the sequential decision-making problem of allocating a limited resource to agents that reveal their stochastic demands on arrival over a finite horizon. Our goal is to design fair allocation algorithms that exhaust the available resource budget. This is challenging in sequential settings where information on future demands is not available at the time of decision-making. We formulate the problem as a discrete time Markov decision process (MDP). We propose a new algorithm, SAFFE, that makes fair allocations with respect to the entire demands revealed over the horizon by accounting for expected future demands at each arrival time. The algorithm introduces regularization which enables the prioritization of current revealed demands over future potential demands depending on the uncertainty in agents' future demands. Using the MDP formulation, we show that SAFFE optimizes allocations based on an upper bound on the Nash Social Welfare fairness objective, and we bound its gap to optimality with the use of concentration bounds on total future demands. Using synthetic and real data, we compare the performance of SAFFE against existing approaches and a reinforcement learning policy trained on the MDP. We show that SAFFE leads to more fair and efficient allocations and achieves close-to-optimal performance in settings with dense arrivals. △ Less

Submitted 16 June, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

arXiv:2205.02902 [pdf, other]

doi 10.1016/j.cma.2022.115810

Lagrangian PINNs: A causality-conforming solution to failure modes of physics-informed neural networks

Authors: Rambod Mojgani, Maciej Balajewicz, Pedram Hassanzadeh

Abstract: Physics-informed neural networks (PINNs) leverage neural-networks to find the solutions of partial differential equation (PDE)-constrained optimization problems with initial conditions and boundary conditions as soft constraints. These soft constraints are often considered to be the sources of the complexity in the training phase of PINNs. Here, we demonstrate that the challenge of training (i) pe… ▽ More Physics-informed neural networks (PINNs) leverage neural-networks to find the solutions of partial differential equation (PDE)-constrained optimization problems with initial conditions and boundary conditions as soft constraints. These soft constraints are often considered to be the sources of the complexity in the training phase of PINNs. Here, we demonstrate that the challenge of training (i) persists even when the boundary conditions are strictly enforced, and (ii) is closely related to the Kolmogorov n-width associated with problems demonstrating transport, convection, traveling waves, or moving fronts. Given this realization, we describe the mechanism underlying the training schemes such as those used in eXtended PINNs (XPINN), curriculum regularization, and sequence-to-sequence learning. For an important category of PDEs, i.e., governed by non-linear convection-diffusion equation, we propose reformulating PINNs on a Lagrangian frame of reference, i.e., LPINNs, as a PDE-informed solution. A parallel architecture with two branches is proposed. One branch solves for the state variables on the characteristics, and the second branch solves for the low-dimensional characteristics curves. The proposed architecture conforms to the causality innate to the convection, and leverages the direction of travel of the information in the domain. Finally, we demonstrate that the loss landscapes of LPINNs are less sensitive to the so-called "complexity" of the problems, compared to those in the traditional PINNs in the Eulerian framework. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: 15 pages, 12 figures

arXiv:2203.08019 [pdf, other]

Optimal Admission Control for Multiclass Queues with Time-Varying Arrival Rates via State Abstraction

Authors: Marc Rigter, Danial Dervovic, Parisa Hassanzadeh, Jason Long, Parisa Zehtabi, Daniele Magazzeni

Abstract: We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks into a no buffer queue which are processed by $N$ identical servers. Each task has a price, which is a positive real number, and a class. Each class of task has a different price distribution and service rate, and arrives according to an inhomogenous Poisson process. The objective i… ▽ More We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks into a no buffer queue which are processed by $N$ identical servers. Each task has a price, which is a positive real number, and a class. Each class of task has a different price distribution and service rate, and arrives according to an inhomogenous Poisson process. The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon. We formulate the problem as a discrete time Markov Decision Process (MDP) with a hybrid state space. We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly. Moreover, we prove that as the time step is reduced, the discrete time solution approaches the optimal solution to the original continuous time problem. To improve the scalability of our approach to a greater number of task classes, we present an approximation based on state abstraction. We validate our approach on synthetic data, as well as a real financial fraud data set, which is the motivating application for this work. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 7+1 pages main text, 16 pages supplementary material, accepted to AAAI 2022

arXiv:2110.00546 [pdf, other]

doi 10.1063/5.0091282

Discovery of interpretable structural model errors by combining Bayesian sparse regression and data assimilation: A chaotic Kuramoto-Sivashinsky test case

Authors: Rambod Mojgani, Ashesh Chattopadhyay, Pedram Hassanzadeh

Abstract: Models of many engineering and natural systems are imperfect. The discrepancy between the mathematical representations of a true physical system and its imperfect model is called the model error. These model errors can lead to substantial differences between the numerical solutions of the model and the state of the system, particularly in those involving nonlinear, multi-scale phenomena. Thus, the… ▽ More Models of many engineering and natural systems are imperfect. The discrepancy between the mathematical representations of a true physical system and its imperfect model is called the model error. These model errors can lead to substantial differences between the numerical solutions of the model and the state of the system, particularly in those involving nonlinear, multi-scale phenomena. Thus, there is increasing interest in reducing model errors, particularly by leveraging the rapidly growing observational data to understand their physics and sources. Here, we introduce a framework named MEDIDA: Model Error Discovery with Interpretability and Data Assimilation. MEDIDA only requires a working numerical solver of the model and a small number of noise-free or noisy sporadic observations of the system. In MEDIDA, first the model error is estimated from differences between the observed states and model-predicted states (the latter are obtained from a number of one-time-step numerical integrations from the previous observed states). If observations are noisy, a data assimilation (DA) technique such as ensemble Kalman filter (EnKF) is employed to provide the analysis state of the system, which is then used to estimate the model error. Finally, an equation-discovery technique, here the relevance vector machine (RVM), a sparsity-promoting Bayesian method, is used to identify an interpretable, parsimonious, and closed-form representation of the model error. Using the chaotic Kuramoto-Sivashinsky (KS) system as the test case, we demonstrate the excellent performance of MEDIDA in discovering different types of structural/parametric model errors, representing different types of missing physics, using noise-free and noisy observations. △ Less

Submitted 2 June, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: 9 pages, 2 figures

Journal ref: Chaos 32, 061105 (2022)

arXiv:1906.08829 [pdf, other]

doi 10.5194/npg-27-373-2020

Data-driven prediction of a multi-scale Lorenz 96 chaotic system using deep learning methods: Reservoir computing, ANN, and RNN-LSTM

Authors: Ashesh Chattopadhyay, Pedram Hassanzadeh, Devika Subramanian

Abstract: In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM… ▽ More In this paper, the performance of three deep learning methods for predicting short-term evolution and for reproducing the long-term statistics of a multi-scale spatio-temporal Lorenz 96 system is examined. The methods are: echo state network (a type of reservoir computing, RC-ESN), deep feed-forward artificial neural network (ANN), and recurrent neural network with long short-term memory (RNN-LSTM). This Lorenz 96 system has three tiers of nonlinearly interacting variables representing slow/large-scale ($X$), intermediate ($Y$), and fast/small-scale ($Z$) processes. For training or testing, only $X$ is available; $Y$ and $Z$ are never known or used. We show that RC-ESN substantially outperforms ANN and RNN-LSTM for short-term prediction, e.g., accurately forecasting the chaotic trajectories for hundreds of numerical solver's time steps, equivalent to several Lyapunov timescales. The RNN-LSTM and ANN show some prediction skills as well; RNN-LSTM bests ANN. Furthermore, even after losing the trajectory, data predicted by RC-ESN and RNN-LSTM have probability density functions (PDFs) that closely match the true PDF, even at the tails. The PDF of the data predicted using ANN, however, deviates from the true PDF. Implications, caveats, and applications to data-driven and data-assisted surrogate modeling of complex nonlinear dynamical systems such as weather/climate are discussed. △ Less

Submitted 5 December, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: Some changes, in Figures, addition of an appendix etc has been done

Journal ref: Nonlin. Processes Geophys. 2020

arXiv:1812.09438

Data-driven Spatio-temporal Prediction of High-dimensional Geophysical Turbulence using Koopman Operator Approximation

Authors: M. A. Khodkar, Athanasios C. Antoulas, Pedram Hassanzadeh

Abstract: We show the skills of a data-driven low-dimensional linear model in predicting the spatio-temporal evolution of turbulent Rayleigh-Bénard convection. The model is based on dynamic mode decomposition with delay-embedding, which provides a data-driven finite-dimensional approximation to the system's Koopman operator. The model is built using vector-valued observables from direct numerical simulation… ▽ More We show the skills of a data-driven low-dimensional linear model in predicting the spatio-temporal evolution of turbulent Rayleigh-Bénard convection. The model is based on dynamic mode decomposition with delay-embedding, which provides a data-driven finite-dimensional approximation to the system's Koopman operator. The model is built using vector-valued observables from direct numerical simulations, and can provide accurate predictions. Similar high prediction skills are found for the Kuramoto-Sivashinsky equation in the strongly-chaotic regimes. △ Less

Submitted 1 March, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

Comments: Figures and texts need to be revised

arXiv:1805.10577 [pdf, ps, other]

doi 10.1017/jfm.2018.586

Data-driven reduced modelling of turbulent Rayleigh-Benard convection using DMD-enhanced Fluctuation-Dissipation Theorem

Authors: M. A. Khodkar, Pedram Hassanzadeh

Abstract: A data-driven, model-free framework is introduced for calculating Reduced-Order Models (ROMs) capable of accurately predicting time-mean responses to external forcings, or forcings needed for specified responses, e.g., for control, in fully turbulent flows. The framework is based on using the Fluctuation-Dissipation Theorem (FDT) in the space of a limited number of modes obtained from Dynamic Mode… ▽ More A data-driven, model-free framework is introduced for calculating Reduced-Order Models (ROMs) capable of accurately predicting time-mean responses to external forcings, or forcings needed for specified responses, e.g., for control, in fully turbulent flows. The framework is based on using the Fluctuation-Dissipation Theorem (FDT) in the space of a limited number of modes obtained from Dynamic Mode Decomposition (DMD). Using the DMD modes as the basis functions, rather than the commonly used Proper Orthogonal Decomposition (POD) modes, resolves a previously identified problem in applying FDT to high-dimensional, non-normal turbulent flows. Employing this DMD-enhanced FDT method (FDT$_\mathrm{DMD}$), a 1D linear ROM with horizontally averaged temperature as state vector, is calculated for a 3D Rayleigh-Bénard convection system at the Rayleigh number of $10^6$ using data obtained from Direct Numerical Simulation (DNS). The calculated ROM performs well in various tests for this turbulent flow, suggesting FDT$_\mathrm{DMD}$ as a promising method for developing ROMs for high-dimensional, turbulent systems. △ Less

Submitted 6 September, 2018; v1 submitted 26 May, 2018; originally announced May 2018.

Comments: revised manuscript (accepted for publication)

Journal ref: Journal of Fluid Mechanics, 852, 2018

arXiv:1309.5542 [pdf, ps, other]

doi 10.1017/jfm.2014.306

Wall to Wall Optimal Transport

Authors: Pedram Hassanzadeh, Gregory P. Chini, Charles R. Doering

Abstract: The calculus of variations is employed to find steady divergence-free velocity fields that maximize transport of a tracer between two parallel walls held at fixed concentration for one of two constraints on flow strength: a fixed value of the kinetic energy or a fixed value of the enstrophy. The optimizing flows consist of an array of (convection) cells of a particular aspect ratio Gamma. We solve… ▽ More The calculus of variations is employed to find steady divergence-free velocity fields that maximize transport of a tracer between two parallel walls held at fixed concentration for one of two constraints on flow strength: a fixed value of the kinetic energy or a fixed value of the enstrophy. The optimizing flows consist of an array of (convection) cells of a particular aspect ratio Gamma. We solve the nonlinear Euler-Lagrange equations analytically for weak flows and numerically (and via matched asymptotic analysis in the fixed energy case) for strong flows. We report the results in terms of the Nusselt number Nu, a dimensionless measure of the tracer transport, as a function of the Peclet number Pe, a dimensionless measure of the energy or enstrophy of the flow. For both constraints the maximum transport Nu_{MAX}(Pe) is realized in cells of decreasing aspect ratio Gamma_{opt}(Pe) as Pe increases. For the fixed energy problem, Nu_{MAX} \sim Pe and Gamma_{opt} \sim Pe^{-1/2}, while for the fixed enstrophy scenario, Nu_{MAX} \sim Pe^{10/17} and Gamma_{opt} \sim Pe^{-0.36}. We also interpret our results in the context of certain buoyancy-driven Rayleigh-Benard convection problems that satisfy one of the two intensity constraints, enabling us to investigate how the transport scalings compare with upper bounds on Nu expressed as a function of the Rayleigh number \Ra. For steady convection in porous media, corresponding to the fixed energy problem, we find Nu_{MAX} \sim \Ra and Gamma_{opt} \sim Ra^{-1/2}$, while for steady convection in a pure fluid layer between free-slip isothermal walls, corresponding to fixed enstrophy transport, Nu_{MAX} \sim Ra^{5/12} and Gamma_{opt} \sim Ra^{-1/4}. △ Less

Submitted 13 April, 2014; v1 submitted 21 September, 2013; originally announced September 2013.

Comments: Revision submitted to the Journal of Fluid Mechanics

Journal ref: J. Fluid Mechanics, Volume 751, 2014

Showing 1–9 of 9 results for author: Hassanzadeh, P