-
Information-Theoretic Methods for Identifying Relationships among Climate Variables
Authors:
Kevin H. Knuth,
Deniz Gençağa,
William B. Rossow
Abstract:
Information-theoretic quantities, such as entropy, are used to quantify the amount of information a given variable provides. Entropies can be used together to compute the mutual information, which quantifies the amount of information two variables share. However, accurately estimating these quantities from data is extremely challenging. We have developed a set of computational techniques that allo…
▽ More
Information-theoretic quantities, such as entropy, are used to quantify the amount of information a given variable provides. Entropies can be used together to compute the mutual information, which quantifies the amount of information two variables share. However, accurately estimating these quantities from data is extremely challenging. We have developed a set of computational techniques that allow one to accurately compute marginal and joint entropies. These algorithms are probabilistic in nature and thus provide information on the uncertainty in our estimates, which enable us to establish statistical significance of our findings. We demonstrate these methods by identifying relations between cloud data from the International Satellite Cloud Climatology Project (ISCCP) and data from other sources, such as equatorial pacific sea surface temperatures (SST).
△ Less
Submitted 19 December, 2014;
originally announced December 2014.
-
Survey On The Estimation Of Mutual Information Methods as a Measure of Dependency Versus Correlation Analysis
Authors:
D. Gencaga,
N. K. Malakar,
D. J. Lary
Abstract:
In this survey, we present and compare different approaches to estimate Mutual Information (MI) from data to analyse general dependencies between variables of interest in a system. We demonstrate the performance difference of MI versus correlation analysis, which is only optimal in case of linear dependencies. First, we use a piece-wise constant Bayesian methodology using a general Dirichlet prior…
▽ More
In this survey, we present and compare different approaches to estimate Mutual Information (MI) from data to analyse general dependencies between variables of interest in a system. We demonstrate the performance difference of MI versus correlation analysis, which is only optimal in case of linear dependencies. First, we use a piece-wise constant Bayesian methodology using a general Dirichlet prior. In this estimation method, we use a two-stage approach where we approximate the probability distribution first and then calculate the marginal and joint entropies. Here, we demonstrate the performance of this Bayesian approach versus the others for computing the dependency between different variables. We also compare these with linear correlation analysis. Finally, we apply MI and correlation analysis to the identification of the bias in the determination of the aerosol optical depth (AOD) by the satellite based Moderate Resolution Imaging Spectroradiometer (MODIS) and the ground based AErosol RObotic NETwork (AERONET). Here, we observe that the AOD measurements by these two instruments might be different for the same location. The reason of this bias is explored by quantifying the dependencies between the bias and 15 other variables including cloud cover, surface reflectivity and others.
△ Less
Submitted 14 January, 2014;
originally announced January 2014.
-
Towards Identification of Relevant Variables in the observed Aerosol Optical Depth Bias between MODIS and AERONET observations
Authors:
N. K. Malakar,
D. J. Lary,
D. Gencaga,
A. Albayrak,
J. Wei
Abstract:
Measurements made by satellite remote sensing, Moderate Resolution Imaging Spectroradiometer (MODIS), and globally distributed Aerosol Robotic Network (AERONET) are compared. Comparison of the two datasets measurements for aerosol optical depth values show that there are biases between the two data products. In this paper, we present a general framework towards identifying relevant set of variable…
▽ More
Measurements made by satellite remote sensing, Moderate Resolution Imaging Spectroradiometer (MODIS), and globally distributed Aerosol Robotic Network (AERONET) are compared. Comparison of the two datasets measurements for aerosol optical depth values show that there are biases between the two data products. In this paper, we present a general framework towards identifying relevant set of variables responsible for the observed bias. We present a general framework to identify the possible factors influencing the bias, which might be associated with the measurement conditions such as the solar and sensor zenith angles, the solar and sensor azimuth, scattering angles, and surface reflectivity at the various measured wavelengths, etc. Specifically, we performed analysis for remote sensing Aqua-Land data set, and used machine learning technique, neural network in this case, to perform multivariate regression between the ground-truth and the training data sets. Finally, we used mutual information between the observed and the predicted values as the measure of similarity to identify the most relevant set of variables. The search is brute force method as we have to consider all possible combinations. The computations involves a huge number crunching exercise, and we implemented it by writing a job-parallel program.
△ Less
Submitted 12 February, 2013;
originally announced February 2013.