-
Turning Up the Heat: Assessing 2-m Temperature Forecast Errors in AI Weather Prediction Models During Heat Waves
Authors:
Kelsey E. Ennis,
Elizabeth A. Barnes,
Marybeth C. Arcodia,
Martin A. Fernandez,
Eric D. Maloney
Abstract:
Extreme heat is the deadliest weather-related hazard in the United States. Furthermore, it is increasing in intensity, frequency, and duration, making skillful forecasts vital to protecting life and property. Traditional numerical weather prediction (NWP) models struggle with extreme heat for medium-range and subseasonal-to-seasonal (S2S) timescales. Meanwhile, artificial intelligence-based weathe…
▽ More
Extreme heat is the deadliest weather-related hazard in the United States. Furthermore, it is increasing in intensity, frequency, and duration, making skillful forecasts vital to protecting life and property. Traditional numerical weather prediction (NWP) models struggle with extreme heat for medium-range and subseasonal-to-seasonal (S2S) timescales. Meanwhile, artificial intelligence-based weather prediction (AIWP) models are progressing rapidly. However, it is largely unknown how well AIWP models forecast extremes, especially for medium-range and S2S timescales. This study investigates 2-m temperature forecasts for 60 heat waves across the four boreal seasons and over four CONUS regions at lead times up to 20 days, using two AIWP models (Google GraphCast and Pangu-Weather) and one traditional NWP model (NOAA United Forecast System Global Ensemble Forecast System (UFS GEFS)). First, case study analyses show that both AIWP models and the UFS GEFS exhibit consistent cold biases on regional scales in the 5-10 days of lead time before heat wave onset. GraphCast is the more skillful AIWP model, outperforming UFS GEFS and Pangu-Weather in most locations. Next, the two AIWP models are isolated and analyzed across all heat waves and seasons, with events split among the model's testing (2018-2023) and training (1979-2017) periods. There are cold biases before and during the heat waves in both models and all seasons, except Pangu-Weather in winter, which exhibits a mean warm bias before heat wave onset. Overall, results offer encouragement that AIWP models may be useful for medium-range and S2S predictability of extreme heat.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Predicting Tropical Cyclone Track Forecast Errors using a Probabilistic Neural Network
Authors:
M. A. Fernandez,
Elizabeth A. Barnes,
Randal J. Barnes,
Mark DeMaria,
Marie McGraw,
Galina Chirokova,
Lixin Lu
Abstract:
A new method for estimating tropical cyclone track uncertainty is presented and tested. This method uses a neural network to predict a bivariate normal distribution, which serves as an estimate for track uncertainty. We train the network and make predictions on forecasts from the National Hurricane Center (NHC), which currently uses static error distributions based on forecasts from the past five…
▽ More
A new method for estimating tropical cyclone track uncertainty is presented and tested. This method uses a neural network to predict a bivariate normal distribution, which serves as an estimate for track uncertainty. We train the network and make predictions on forecasts from the National Hurricane Center (NHC), which currently uses static error distributions based on forecasts from the past five years for most applications. The neural network-based method produces uncertainty estimates that are dynamic and probabilistic. Further, the neural network-based method allows for probabilistic statements about tropical cyclone trajectories, including landfall probability, which we highlight. We show that our predictions are well calibrated using multiple metrics, that our method produces better uncertainty estimates than current NHC approaches, and that our method achieves similar performance to the Global Ensemble Forecast System. Once trained, the computational cost of predictions using this method is negligible, making it a strong candidate to improve the NHC's operational estimations of tropical cyclone track uncertainty.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Multi-Year-to-Decadal Temperature Prediction using a Machine Learning Model-Analog Framework
Authors:
M. A. Fernandez,
Elizabeth A. Barnes
Abstract:
Multi-year-to-decadal climate prediction is a key tool in understanding the range of potential regional and global climate futures. Here, we present a framework that combines machine learning and analog forecasting for predictions on these timescales. A neural network is used to learn a mask, specific to a region and lead time, with global weights based on relative importance as precursors to the…
▽ More
Multi-year-to-decadal climate prediction is a key tool in understanding the range of potential regional and global climate futures. Here, we present a framework that combines machine learning and analog forecasting for predictions on these timescales. A neural network is used to learn a mask, specific to a region and lead time, with global weights based on relative importance as precursors to the evolution of that prediction target. A library of mask-weighted model states, or potential analogs, are then compared to a single mask-weighted observational state. The known future of the best matching potential analogs serve as the prediction for the future of the observational state. We match and predict 2-meter temperature using the Berkeley Earth Surface Temperature dataset for observations, and a set of CMIP6 models as the analog library. We find improved performance over traditional analog methods and initialized decadal predictions.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
Recommendations for Comprehensive and Independent Evaluation of Machine Learning-Based Earth System Models
Authors:
Paul A. Ullrich,
Elizabeth A. Barnes,
William D. Collins,
Katherine Dagon,
Shiheng Duan,
Joshua Elms,
Jiwoo Lee,
L. Ruby Leung,
Dan Lu,
Maria J. Molina,
Travis A. O'Brien,
Finn O. Rebassoo
Abstract:
Machine learning (ML) is a revolutionary technology with demonstrable applications across multiple disciplines. Within the Earth science community, ML has been most visible for weather forecasting, producing forecasts that rival modern physics-based models. Given the importance of deepening our understanding and improving predictions of the Earth system on all time scales, efforts are now underway…
▽ More
Machine learning (ML) is a revolutionary technology with demonstrable applications across multiple disciplines. Within the Earth science community, ML has been most visible for weather forecasting, producing forecasts that rival modern physics-based models. Given the importance of deepening our understanding and improving predictions of the Earth system on all time scales, efforts are now underway to develop forecasting models into Earth-system models (ESMs), capable of representing all components of the coupled Earth system (or their aggregated behavior) and their response to external changes. Modeling the Earth system is a much more difficult problem than weather forecasting, not least because the model must represent the alternate (e.g., future) coupled states of the system for which there are no historical observations. Given that the physical principles that enable predictions about the response of the Earth system are often not explicitly coded in these ML-based models, demonstrating the credibility of ML-based ESMs thus requires us to build evidence of their consistency with the physical system. To this end, this paper puts forward five recommendations to enhance comprehensive, standardized, and independent evaluation of ML-based ESMs to strengthen their credibility and promote their wider use.
△ Less
Submitted 6 January, 2025; v1 submitted 24 October, 2024;
originally announced October 2024.
-
Carefully choose the baseline: Lessons learned from applying XAI attribution methods for regression tasks in geoscience
Authors:
Antonios Mamalakis,
Elizabeth A. Barnes,
Imme Ebert-Uphoff
Abstract:
Methods of eXplainable Artificial Intelligence (XAI) are used in geoscientific applications to gain insights into the decision-making strategy of Neural Networks (NNs) highlighting which features in the input contribute the most to a NN prediction. Here, we discuss our lesson learned that the task of attributing a prediction to the input does not have a single solution. Instead, the attribution re…
▽ More
Methods of eXplainable Artificial Intelligence (XAI) are used in geoscientific applications to gain insights into the decision-making strategy of Neural Networks (NNs) highlighting which features in the input contribute the most to a NN prediction. Here, we discuss our lesson learned that the task of attributing a prediction to the input does not have a single solution. Instead, the attribution results and their interpretation depend greatly on the considered baseline (sometimes referred to as reference point) that the XAI method utilizes; a fact that has been overlooked so far in the literature. This baseline can be chosen by the user or it is set by construction in the method s algorithm, often without the user being aware of that choice. We highlight that different baselines can lead to different insights for different science questions and, thus, should be chosen accordingly. To illustrate the impact of the baseline, we use a large ensemble of historical and future climate simulations forced with the SSP3-7.0 scenario and train a fully connected NN to predict the ensemble- and global-mean temperature (i.e., the forced global warming signal) given an annual temperature map from an individual ensemble member. We then use various XAI methods and different baselines to attribute the network predictions to the input. We show that attributions differ substantially when considering different baselines, as they correspond to answering different science questions. We conclude by discussing some important implications and considerations about the use of baselines in XAI research.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Investigating the fidelity of explainable artificial intelligence methods for applications of convolutional neural networks in geoscience
Authors:
Antonios Mamalakis,
Elizabeth A. Barnes,
Imme Ebert-Uphoff
Abstract:
Convolutional neural networks (CNNs) have recently attracted great attention in geoscience due to their ability to capture non-linear system behavior and extract predictive spatiotemporal patterns. Given their black-box nature however, and the importance of prediction explainability, methods of explainable artificial intelligence (XAI) are gaining popularity as a means to explain the CNN decision-…
▽ More
Convolutional neural networks (CNNs) have recently attracted great attention in geoscience due to their ability to capture non-linear system behavior and extract predictive spatiotemporal patterns. Given their black-box nature however, and the importance of prediction explainability, methods of explainable artificial intelligence (XAI) are gaining popularity as a means to explain the CNN decision-making strategy. Here, we establish an intercomparison of some of the most popular XAI methods and investigate their fidelity in explaining CNN decisions for geoscientific applications. Our goal is to raise awareness of the theoretical limitations of these methods and gain insight into the relative strengths and weaknesses to help guide best practices. The considered XAI methods are first applied to an idealized attribution benchmark, where the ground truth of explanation of the network is known a priori, to help objectively assess their performance. Secondly, we apply XAI to a climate-related prediction setting, namely to explain a CNN that is trained to predict the number of atmospheric rivers in daily snapshots of climate simulations. Our results highlight several important issues of XAI methods (e.g., gradient shattering, inability to distinguish the sign of attribution, ignorance to zero input) that have previously been overlooked in our field and, if not considered cautiously, may lead to a distorted picture of the CNN decision-making strategy. We envision that our analysis will motivate further investigation into XAI fidelity and will help towards a cautious implementation of XAI in geoscience, which can lead to further exploitation of CNNs and deep learning for prediction problems.
△ Less
Submitted 5 September, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Controlled abstention neural networks for identifying skillful predictions for classification problems
Authors:
Elizabeth A. Barnes,
Randal J. Barnes
Abstract:
The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed "forecasts of opportunity." When these opportunities are not present, scientists need prediction systems that a…
▽ More
The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed "forecasts of opportunity." When these opportunities are not present, scientists need prediction systems that are capable of saying "I don't know." We introduce a novel loss function, termed the "NotWrong loss", that allows neural networks to identify forecasts of opportunity for classification problems. The NotWrong loss introduces an abstention class that allows the network to identify the more confident samples and abstain (say "I don't know") on the less confident samples. The abstention loss is designed to abstain on a user-defined fraction of the samples via a PID controller. Unlike many machine learning methods used to reject samples post-training, the NotWrong loss is applied during training to preferentially learn from the more confident samples. We show that the NotWrong loss outperforms other existing loss functions for multiple climate use cases. The implementation of the proposed loss function is straightforward in most network architectures designed for classification as it only requires the addition of an abstention class to the output layer and modification of the loss function.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Controlled abstention neural networks for identifying skillful predictions for regression problems
Authors:
Elizabeth A. Barnes,
Randal J. Barnes
Abstract:
The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed "forecasts of opportunity". When these opportunities are not present, scientists need prediction systems that a…
▽ More
The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed "forecasts of opportunity". When these opportunities are not present, scientists need prediction systems that are capable of saying "I don't know." We introduce a novel loss function, termed "abstention loss", that allows neural networks to identify forecasts of opportunity for regression problems. The abstention loss works by incorporating uncertainty in the network's prediction to identify the more confident samples and abstain (say "I don't know") on the less confident samples. The abstention loss is designed to determine the optimal abstention fraction, or abstain on a user-defined fraction via a PID controller. Unlike many methods for attaching uncertainty to neural network predictions post-training, the abstention loss is applied during training to preferentially learn from the more confident samples. The abstention loss is built upon a standard computer science method. While the standard approach is itself a simple yet powerful tool for incorporating uncertainty in regression problems, we demonstrate that the abstention loss outperforms this more standard method for the synthetic climate use cases explored here. The implementation of proposed loss function is straightforward in most network architectures designed for regression, as it only requires modification of the output layer and loss function.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Neural Network Attribution Methods for Problems in Geoscience: A Novel Synthetic Benchmark Dataset
Authors:
Antonios Mamalakis,
Imme Ebert-Uphoff,
Elizabeth A. Barnes
Abstract:
Despite the increasingly successful application of neural networks to many problems in the geosciences, their complex and nonlinear structure makes the interpretation of their predictions difficult, which limits model trust and does not allow scientists to gain physical insights about the problem at hand. Many different methods have been introduced in the emerging field of eXplainable Artificial I…
▽ More
Despite the increasingly successful application of neural networks to many problems in the geosciences, their complex and nonlinear structure makes the interpretation of their predictions difficult, which limits model trust and does not allow scientists to gain physical insights about the problem at hand. Many different methods have been introduced in the emerging field of eXplainable Artificial Intelligence (XAI), which aim at attributing the network s prediction to specific features in the input domain. XAI methods are usually assessed by using benchmark datasets (like MNIST or ImageNet for image classification). However, an objective, theoretically derived ground truth for the attribution is lacking for most of these datasets, making the assessment of XAI in many cases subjective. Also, benchmark datasets specifically designed for problems in geosciences are rare. Here, we provide a framework, based on the use of additively separable functions, to generate attribution benchmark datasets for regression problems for which the ground truth of the attribution is known a priori. We generate a large benchmark dataset and train a fully connected network to learn the underlying function that was used for simulation. We then compare estimated heatmaps from different XAI methods to the ground truth in order to identify examples where specific XAI methods perform well or poorly. We believe that attribution benchmarks as the ones introduced herein are of great importance for further application of neural networks in the geosciences, and for more objective assessment and accurate implementation of XAI methods, which will increase model trust and assist in discovering new science.
△ Less
Submitted 10 June, 2022; v1 submitted 17 March, 2021;
originally announced March 2021.
-
Will Artificial Intelligence supersede Earth System and Climate Models?
Authors:
Christopher Irrgang,
Niklas Boers,
Maike Sonnewald,
Elizabeth A. Barnes,
Christopher Kadow,
Joanna Staneva,
Jan Saynisch-Wagner
Abstract:
We outline a perspective of an entirely new research branch in Earth and climate sciences, where deep neural networks and Earth system models are dismantled as individual methodological approaches and reassembled as learning, self-validating, and interpretable Earth system model-network hybrids. Following this path, we coin the term "Neural Earth System Modelling" (NESYM) and highlight the necessi…
▽ More
We outline a perspective of an entirely new research branch in Earth and climate sciences, where deep neural networks and Earth system models are dismantled as individual methodological approaches and reassembled as learning, self-validating, and interpretable Earth system model-network hybrids. Following this path, we coin the term "Neural Earth System Modelling" (NESYM) and highlight the necessity of a transdisciplinary discussion platform, bringing together Earth and climate scientists, big data analysts, and AI experts. We examine the concurrent potential and pitfalls of Neural Earth System Modelling and discuss the open question whether artificial intelligence will not only infuse Earth system modelling, but ultimately render them obsolete.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
Physically Interpretable Neural Networks for the Geosciences: Applications to Earth System Variability
Authors:
Benjamin A. Toms,
Elizabeth A. Barnes,
Imme Ebert-Uphoff
Abstract:
Neural networks have become increasingly prevalent within the geosciences, although a common limitation of their usage has been a lack of methods to interpret what the networks learn and how they make decisions. As such, neural networks have often been used within the geosciences to most accurately identify a desired output given a set of inputs, with the interpretation of what the network learns…
▽ More
Neural networks have become increasingly prevalent within the geosciences, although a common limitation of their usage has been a lack of methods to interpret what the networks learn and how they make decisions. As such, neural networks have often been used within the geosciences to most accurately identify a desired output given a set of inputs, with the interpretation of what the network learns used as a secondary metric to ensure the network is making the right decision for the right reason. Neural network interpretation techniques have become more advanced in recent years, however, and we therefore propose that the ultimate objective of using a neural network can also be the interpretation of what the network has learned rather than the output itself.
We show that the interpretation of neural networks can enable the discovery of scientifically meaningful connections within geoscientific data. In particular, we use two methods for neural network interpretation called backwards optimization and layerwise relevance propagation, both of which project the decision pathways of a network back onto the original input dimensions. To the best of our knowledge, LRP has not yet been applied to geoscientific research, and we believe it has great potential in this area. We show how these interpretation techniques can be used to reliably infer scientifically meaningful information from neural networks by applying them to common climate patterns. These results suggest that combining interpretable neural networks with novel scientific hypotheses will open the door to many new avenues in neural network-related geoscience research.
△ Less
Submitted 27 May, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.