-
A Practical Introduction to Regression-based Causal Inference in Meteorology (I): All confounders measured
Authors:
Caren Marzban,
Yikun Zhang,
Nicholas Bond,
Michael Richman
Abstract:
Whether a variable is the cause of another, or simply associated with it, is often an important scientific question. Causal Inference is the name associated with the body of techniques for addressing that question in a statistical setting. Although assessing causality is relatively straightforward in the presence of temporal information, outside of that setting - the situation considered here - it…
▽ More
Whether a variable is the cause of another, or simply associated with it, is often an important scientific question. Causal Inference is the name associated with the body of techniques for addressing that question in a statistical setting. Although assessing causality is relatively straightforward in the presence of temporal information, outside of that setting - the situation considered here - it is more difficult to assess causal effects. The development of the field of causal inference has involved concepts from a wide range of topics, thereby limiting its adoption across some fields, including meteorology. However, at its core, the requisite knowledge for causal inference involves little more than basic probability theory and regression, topics familiar to most meteorologists. By focusing on these core areas, this and a companion article provide a steppingstone for the meteorology community into the field of (non-temporal) causal inference. Although some theoretical foundations are presented, the main goal is the application of a specific method, called matching, to a problem in meteorology. The data for the application are in public domain, and R code is provided as well, forming an easy path for meteorology students and researchers to enter the field.
△ Less
Submitted 24 June, 2025; v1 submitted 23 June, 2025;
originally announced June 2025.
-
A Practical Introduction to Regression-based Causal Inference in Meteorology (II): Unmeasured confounders
Authors:
Caren Marzban,
Yikun Zhang,
Nicholas Bond,
Michael Richman
Abstract:
One obstacle to ``elevating" correlation to causation is the phenomenon of confounding, i.e., when a correlation between two variables exists because both variables are in fact caused by a third variable. The situation where the confounders are measured is examined in an earlier, accompanying article. Here, it is shown that even when the confounding variables are not measured, it is still possible…
▽ More
One obstacle to ``elevating" correlation to causation is the phenomenon of confounding, i.e., when a correlation between two variables exists because both variables are in fact caused by a third variable. The situation where the confounders are measured is examined in an earlier, accompanying article. Here, it is shown that even when the confounding variables are not measured, it is still possible to estimate the causal effect via a regression-based method that uses the notion of Instrumental Variables. Using meteorological data set, similar to that in the sister article, a number of different estimates of the causal effect are compared and contrasted. It is shown that the Instrumental Variable results based on unmeasured confounders are consistent with those of the sister article where confounders are measured.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Principal Component Analysis for Equation Discovery
Authors:
Caren Marzban,
Ulvi Yurtsever,
Michael Richman
Abstract:
Principal Component Analysis (PCA) is one of the most commonly used statistical methods for data exploration, and for dimensionality reduction wherein the first few principal components account for an appreciable proportion of the variability in the data. Less commonly, attention is paid to the last principal components because they do not account for an appreciable proportion of variability. Howe…
▽ More
Principal Component Analysis (PCA) is one of the most commonly used statistical methods for data exploration, and for dimensionality reduction wherein the first few principal components account for an appreciable proportion of the variability in the data. Less commonly, attention is paid to the last principal components because they do not account for an appreciable proportion of variability. However, this defining characteristic of the last principal components also qualifies them as combinations of variables that are constant across the cases. Such constant-combinations are important because they may reflect underlying laws of nature. In situations involving a large number of noisy covariates, the underlying law may not correspond to the last principal component, but rather to one of the last. Consequently, a criterion is required to identify the relevant eigenvector. In this paper, two examples are employed to demonstrate the proposed methodology; one from Physics, involving a small number of covariates, and another from Meteorology wherein the number of covariates is in the thousands. It is shown that with an appropriate selection criterion, PCA can be employed to ``discover" Kepler's third law (in the former), and the hypsometric equation (in the latter).
△ Less
Submitted 9 January, 2024;
originally announced January 2024.