-
Causal machine learning for sustainable agroecosystems
Authors:
Vasileios Sitokonstantinou,
Emiliano Díaz Salas Porras,
Jordi Cerdà Bautista,
Maria Piles,
Ioannis Athanasiadis,
Hannah Kerner,
Giulia Martini,
Lily-belle Sweet,
Ilias Tsoumas,
Jakob Zscheischler,
Gustau Camps-Valls
Abstract:
In a changing climate, sustainable agriculture is essential for food security and environmental health. However, it is challenging to understand the complex interactions among its biophysical, social, and economic components. Predictive machine learning (ML), with its capacity to learn from data, is leveraged in sustainable agriculture for applications like yield prediction and weather forecasting…
▽ More
In a changing climate, sustainable agriculture is essential for food security and environmental health. However, it is challenging to understand the complex interactions among its biophysical, social, and economic components. Predictive machine learning (ML), with its capacity to learn from data, is leveraged in sustainable agriculture for applications like yield prediction and weather forecasting. Nevertheless, it cannot explain causal mechanisms and remains descriptive rather than prescriptive. To address this gap, we propose causal ML, which merges ML's data processing with causality's ability to reason about change. This facilitates quantifying intervention impacts for evidence-based decision-making and enhances predictive model robustness. We showcase causal ML through eight diverse applications that benefit stakeholders across the agri-food chain, including farmers, policymakers, and researchers.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Classifying active and inactive states of growing rabbits from accelerometer data using machine learning algorithms
Authors:
Mónica Mora,
Lucile Riaboff,
Ingrid David,
Juan Pablo Sánchez,
Miriam Piles
Abstract:
This study explores how wearable accelerometers, small devices that measure acceleration, can help monitor the activity of growing rabbits. We equipped 16 rabbits with these devices and filmed them for two weeks. By watching the videos and using a special software we figure out what the rabbits were doing -- things like lying down, eating, moving around, and more. These activitties were grouped in…
▽ More
This study explores how wearable accelerometers, small devices that measure acceleration, can help monitor the activity of growing rabbits. We equipped 16 rabbits with these devices and filmed them for two weeks. By watching the videos and using a special software we figure out what the rabbits were doing -- things like lying down, eating, moving around, and more. These activitties were grouped into two states: active or inactive. Then, this information along acceleration data was used to teach a computer program to recognize when the rabbits were active or not. This technology offers a reliable way to understand rabbit behavior, which could lead to better management practices in animal production.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
Let's consider more general nonlinear approaches to study teleconnections of climate variables
Authors:
D. Bueso,
M. Piles,
G. Camps-Valls
Abstract:
The recent work by (Rieger et al 2021) is concerned with the problem of extracting features from spatio-temporal geophysical signals. The authors introduce the complex rotated MCA (xMCA) to deal with lagged effects and non-orthogonality of the feature representation. This method essentially (1) transforms the signals to a complex plane with the Hilbert transform; (2) applies an oblique (Varimax an…
▽ More
The recent work by (Rieger et al 2021) is concerned with the problem of extracting features from spatio-temporal geophysical signals. The authors introduce the complex rotated MCA (xMCA) to deal with lagged effects and non-orthogonality of the feature representation. This method essentially (1) transforms the signals to a complex plane with the Hilbert transform; (2) applies an oblique (Varimax and Promax) rotation to remove the orthogonality constraint; and (3) performs the eigendecomposition in this complex space (Horel et al, 1984). We argue that this method is essentially a particular case of the method called rotated complex kernel principal component analysis (ROCK-PCA) introduced in (Bueso et al., 2019, 2020), where we proposed the same approach: first transform the data to the complex plane with the Hilbert transform and then apply the varimax rotation, with the only difference that the eigendecomposition is performed in the dual (kernel) Hilbert space. The latter allows us to generalize the xMCA solution by extracting nonlinear (curvilinear) features when nonlinear kernel functions are used. Hence, the solution of xMCA boils down to ROCK-PCA when the inner product is computed in the input data space instead of in the high-dimensional (possibly infinite) kernel Hilbert space to which data has been mapped. In this short correspondence we show theoretical proof that xMCA is a special case of ROCK-PCA and provide quantitative evidence that more expressive and informative features can be extracted when working with kernels; results of the decomposition of global sea surface temperature (SST) fields are shown to illustrate the capabilities of ROCK-PCA to cope with nonlinear processes, unlike xMCA.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Persistence in Complex Systems
Authors:
S. Salcedo-Sanz,
D. Casillas-Pérez,
J. Del Ser,
C. Casanova-Mateo,
L. Cuadra,
M. Piles,
G. Camps-Valls
Abstract:
Persistence is an important characteristic of many complex systems in nature, related to how long the system remains at a certain state before changing to a different one. The study of complex systems' persistence involves different definitions and uses different techniques, depending on whether short-term or long-term persistence is considered. In this paper we discuss the most important definiti…
▽ More
Persistence is an important characteristic of many complex systems in nature, related to how long the system remains at a certain state before changing to a different one. The study of complex systems' persistence involves different definitions and uses different techniques, depending on whether short-term or long-term persistence is considered. In this paper we discuss the most important definitions, concepts, methods, literature and latest results on persistence in complex systems. Firstly, the most used definitions of persistence in short-term and long-term cases are presented. The most relevant methods to characterize persistence are then discussed in both cases. A complete literature review is also carried out. We also present and discuss some relevant results on persistence, and give empirical evidence of performance in different detailed case studies, for both short-term and long-term persistence. A perspective on the future of persistence concludes the work.
△ Less
Submitted 11 April, 2022;
originally announced May 2022.
-
Integrating Domain Knowledge in Data-driven Earth Observation with Process Convolutions
Authors:
Daniel Heestermans Svendsen,
Maria Piles,
Jordi Muñoz-Marí,
David Luengo,
Luca Martino,
Gustau Camps-Valls
Abstract:
The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itse…
▽ More
The modelling of Earth observation data is a challenging problem, typically approached by either purely mechanistic or purely data-driven methods. Mechanistic models encode the domain knowledge and physical rules governing the system. Such models, however, need the correct specification of all interactions between variables in the problem and the appropriate parameterization is a challenge in itself. On the other hand, machine learning approaches are flexible data-driven tools, able to approximate arbitrarily complex functions, but lack interpretability and struggle when data is scarce or in extrapolation regimes. In this paper, we argue that hybrid learning schemes that combine both approaches can address all these issues efficiently. We introduce Gaussian process (GP) convolution models for hybrid modelling in Earth observation (EO) problems. We specifically propose the use of a class of GP convolution models called latent force models (LFMs) for EO time series modelling, analysis and understanding. LFMs are hybrid models that incorporate physical knowledge encoded in differential equations into a multioutput GP model. LFMs can transfer information across time-series, cope with missing observations, infer explicit latent functions forcing the system, and learn parameterizations which are very helpful for system analysis and interpretability. We consider time series of soil moisture from active (ASCAT) and passive (SMOS, AMSR2) microwave satellites. We show how assuming a first order differential equation as governing equation, the model automatically estimates the e-folding time or decay rate related to soil moisture persistence and discovers latent forces related to precipitation. The proposed hybrid methodology reconciles the two main approaches in remote sensing parameter estimation by blending statistical learning and mechanistic modeling.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
A global Canopy Water Content product from AVHRR/Metop
Authors:
Francisco Javier García-Haro,
Manuel Campos-Taberner,
Álvaro Moreno,
Håkan Torbern Tagesson,
Fernando Camacho,
Beatriz Martínez,
Sergio Sánchez,
María Piles,
Gustau Camps-Valls,
Marta Yeba,
María Amparo Gilabert
Abstract:
Spatially and temporally explicit canopy water content (CWC) data are important for monitoring vegetation status, and constitute essential information for studying ecosystem-climate interactions. Despite many efforts there is currently no operational CWC product available to users. In the context of the Satellite Application Facility for Land Surface Analysis (LSA-SAF), we have developed an algori…
▽ More
Spatially and temporally explicit canopy water content (CWC) data are important for monitoring vegetation status, and constitute essential information for studying ecosystem-climate interactions. Despite many efforts there is currently no operational CWC product available to users. In the context of the Satellite Application Facility for Land Surface Analysis (LSA-SAF), we have developed an algorithm to produce a global dataset of CWC based on data from the Advanced Very High Resolution Radiometer (AVHRR) sensor on board Meteorological Operational (MetOp) satellites forming the EUMETSAT Polar System (EPS). CWC reflects the water conditions at the leaf level and information related to canopy structure. An accuracy assessment of the EPS/AVHRR CWC indicated a close agreement with multi-temporal ground data from SMAPVEX16 in Canada and Dahra in Senegal. The present study further evaluates the consistency of the LSA-SAF product with respect to the Simplified Level 2 Product Prototype Processor (SL2P) product, and demonstrates its applicability at different spatio-temporal resolutions using optical data from MSI/Sentinel-2 and MODIS/Terra and Aqua. We conclude that the EPS/AVHRR CWC product is a promising tool for monitoring vegetation water status at regional and global scales.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Nonlinear Distribution Regression for Remote Sensing Applications
Authors:
Jose E. Adsuara,
Adrián Pérez-Suay,
Jordi Muñoz-Marí,
Anna Mateo-Sanchis,
Maria Piles,
Gustau Camps-Valls
Abstract:
In many remote sensing applications one wants to estimate variables or parameters of interest from observations. When the target variable is available at a resolution that matches the remote sensing observations, standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate the two. However, we often encounter situations where the target variable…
▽ More
In many remote sensing applications one wants to estimate variables or parameters of interest from observations. When the target variable is available at a resolution that matches the remote sensing observations, standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate the two. However, we often encounter situations where the target variable is only available at the group level, i.e. collectively associated to a number of remotely sensed observations. This problem setting is known in statistics and machine learning as {\em multiple instance learning} or {\em distribution regression}. This paper introduces a nonlinear (kernel-based) method for distribution regression that solves the previous problems without making any assumption on the statistics of the grouped data. The presented formulation considers distribution embeddings in reproducing kernel Hilbert spaces, and performs standard least squares regression with the empirical means therein. A flexible version to deal with multisource data of different dimensionality and sample sizes is also presented and evaluated. It allows working with the native spatial resolution of each sensor, avoiding the need of match-up procedures. Noting the large computational cost of the approach, we introduce an efficient version via random Fourier features to cope with millions of points and groups.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Synergistic Integration of Optical and Microwave Satellite Data for Crop Yield Estimation
Authors:
Anna Mateo-Sanchis,
Maria Piles,
Jordi Muñoz-Marí,
Jose E. Adsuara,
Adrián Pérez-Suay,
Gustau Camps-Valls
Abstract:
Developing accurate models of crop stress, phenology and productivity is of paramount importance, given the increasing need of food. Earth observation remote sensing data provides a unique source of information to monitor crops in a temporally resolved and spatially explicit way. In this study, we propose the combination of multisensor (optical and microwave) remote sensing data for crop yield est…
▽ More
Developing accurate models of crop stress, phenology and productivity is of paramount importance, given the increasing need of food. Earth observation remote sensing data provides a unique source of information to monitor crops in a temporally resolved and spatially explicit way. In this study, we propose the combination of multisensor (optical and microwave) remote sensing data for crop yield estimation and forecasting using two novel approaches. We first propose the lag between Enhanced Vegetation Index derived from MODIS and Vegetation Optical Depth derived from SMAP as a new joint metric combining the information from the two satellite sensors in a unique feature or descriptor. Our second approach avoids summarizing statistics and uses machine learning to combine full time series of EVI and VOD. This study considers two statistical methods, a regularized linear regression and its nonlinear extension called kernel ridge regression to directly estimate the county-level surveyed total production, as well as individual yields of the major crops grown in the region: corn, soybean and wheat. The study area includes the US Corn Belt, and we use agricultural survey data from the National Agricultural Statistics Service (USDA-NASS) for year 2015 for quantitative assessment.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Machine Learning Information Fusion in Earth Observation: A Comprehensive Review of Methods, Applications and Data Sources
Authors:
S. Salcedo-Sanz,
P. Ghamisi,
M. Piles,
M. Werner,
L. Cuadra,
A. Moreno-Martínez,
E. Izquierdo-Verdiguier,
J. Muñoz-Marí,
Amirhosein Mosavi,
G. Camps-Valls
Abstract:
This paper reviews the most important information fusion data-driven algorithms based on Machine Learning (ML) techniques for problems in Earth observation. Nowadays we observe and model the Earth with a wealth of observations, from a plethora of different sensors, measuring states, fluxes, processes and variables, at unprecedented spatial and temporal resolutions. Earth observation is well equipp…
▽ More
This paper reviews the most important information fusion data-driven algorithms based on Machine Learning (ML) techniques for problems in Earth observation. Nowadays we observe and model the Earth with a wealth of observations, from a plethora of different sensors, measuring states, fluxes, processes and variables, at unprecedented spatial and temporal resolutions. Earth observation is well equipped with remote sensing systems, mounted on satellites and airborne platforms, but it also involves in-situ observations, numerical models and social media data streams, among other data sources. Data-driven approaches, and ML techniques in particular, are the natural choice to extract significant information from this data deluge. This paper produces a thorough review of the latest work on information fusion for Earth observation, with a practical intention, not only focusing on describing the most relevant previous works in the field, but also the most important Earth observation applications where ML information fusion has obtained significant results. We also review some of the most currently used data sets, models and sources for Earth observation problems, describing their importance and how to obtain the data when needed. Finally, we illustrate the application of ML data fusion with a representative set of case studies, as well as we discuss and outlook the near future of the field.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Nonlinear Complex PCA for spatio-temporal analysis of global soil moisture
Authors:
Diego Bueso,
Maria Piles,
Gustau Camps-Valls
Abstract:
Soil moisture (SM) is a key state variable of the hydrological cycle, needed to monitor the effects of a changing climate on natural resources. Soil moisture is highly variable in space and time, presenting seasonalities, anomalies and long-term trends, but also, and important nonlinear behaviours. Here, we introduce a novel fast and nonlinear complex PCA method to analyze the spatio-temporal patt…
▽ More
Soil moisture (SM) is a key state variable of the hydrological cycle, needed to monitor the effects of a changing climate on natural resources. Soil moisture is highly variable in space and time, presenting seasonalities, anomalies and long-term trends, but also, and important nonlinear behaviours. Here, we introduce a novel fast and nonlinear complex PCA method to analyze the spatio-temporal patterns of the Earth's surface SM. We use global SM estimates acquired during the period 2010-2017 by ESA's SMOS mission. Our approach unveils both time and space modes, trends and periodicities unlike standard PCA decompositions. Results show the distribution of the total SM variance among its different components, and indicate the dominant modes of temporal variability in surface soil moisture for different regions. The relationship of the derived SM spatio-temporal patterns with El Ni{ñ}o Southern Oscillation (ENSO) conditions is also explored.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
Understanding Climate Impacts on Vegetation with Gaussian Processes in Granger Causality
Authors:
Miguel Morata-Dolz,
Diego Bueso,
Maria Piles,
Gustau Camps-Valls
Abstract:
Global warming is leading to unprecedented changes in our planet, with great societal, economical and environmental implications, especially with the growing demand of biofuels and food. Assessing the impact of climate on vegetation is of pressing need. We approached the attribution problem with a novel nonlinear Granger causal (GC) methodology and used a large data archive of remote sensing satel…
▽ More
Global warming is leading to unprecedented changes in our planet, with great societal, economical and environmental implications, especially with the growing demand of biofuels and food. Assessing the impact of climate on vegetation is of pressing need. We approached the attribution problem with a novel nonlinear Granger causal (GC) methodology and used a large data archive of remote sensing satellite products, environmental and climatic variables spatio-temporally gridded over more than 30 years. We generalize kernel Granger causality by considering the variables cross-relations explicitly in Hilbert spaces, and use the covariance in Gaussian processes. The method generalizes the linear and kernel GC methods, and comes with tighter bounds of performance based on Rademacher complexity. Spatially-explicit global Granger footprints of precipitation and soil moisture on vegetation greenness are identified more sharply than previous GC methods.
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
Estimation of vegetation loss coefficients and canopy penetration depths from SMAP radiometer and IceSAT lidar data
Authors:
M. Baur,
T. Jagdhuber,
M. Link,
M. Piles,
D. Entekhabi,
A. Fink
Abstract:
In this study the framework of the $τ$-$ω$ model is used to derive vegetation loss coefficients and canopy penetration depths from SMAP multi-temporal retrievals of vegetation optical depth, single scattering albedo and ICESat lidar vegetation heights. The vegetation loss coefficients serve as a global indicator of how strong absorption and scattering processes attenuate L-band microwave radiation…
▽ More
In this study the framework of the $τ$-$ω$ model is used to derive vegetation loss coefficients and canopy penetration depths from SMAP multi-temporal retrievals of vegetation optical depth, single scattering albedo and ICESat lidar vegetation heights. The vegetation loss coefficients serve as a global indicator of how strong absorption and scattering processes attenuate L-band microwave radiation. By inverting the vegetation loss coefficients, penetration depths into the canopy can be obtained that is displayed for the global forest reservoirs. A simple penetration index is formed combining vegetation heights and penetration depth estimates. The distribution and level of this index reveal that for densely forested areas the soil signal is attenuated considerably, which can affect the accuracy of soil moisture retrievals.
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
SMAP-based retrieval of vegetation opacity and albedo
Authors:
Dara Entekhabi,
Alexandra Konings,
Maria Piles,
Narendra Das
Abstract:
Over land the vegetation canopy affects the microwave brightness temperature by emission, scattering and attenuation of surface soil emission. The questions addressed in this study are: 1) what is the transparency of the vegetation canopy for different biomes around the Globe at the low-frequency L-band?, 2) what is the seasonal amplitude of vegetation microwave optical depth for different biomes?…
▽ More
Over land the vegetation canopy affects the microwave brightness temperature by emission, scattering and attenuation of surface soil emission. The questions addressed in this study are: 1) what is the transparency of the vegetation canopy for different biomes around the Globe at the low-frequency L-band?, 2) what is the seasonal amplitude of vegetation microwave optical depth for different biomes?, 3) what is the effective scattering at this frequency for different vegetation types?, 4) what is the impact of imprecise characterization of vegetation microwave properties on retrieval of soil surface conditions? These questions are addressed based on the recently completed one full annual cycle measurements by the NASA Soil Moisture Active Passive (SMAP) measurements.
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
Preliminary assessment of an integrated SMOS and MODIS application for global agricultural drought monitoring
Authors:
N. Sánchez,
A. González-Zamora,
J. Martínez-Fernández,
M. Piles,
M. Pablos,
Brian Wardlow,
Tsegaye Tadesse,
Mark Svoboda
Abstract:
An application of the Soil Moisture Agricultural Drought Index (SMADI) at the global scale is presented. The index integrates surface soil moisture from the SMOS mission with land surface temperature (LST) and Normalized Difference Vegetation Index (NDVI) from MODIS and allows for global drought monitoring at medium spatial scales (0.05 deg).. Biweekly maps of SMADI were obtained from year 2010 to…
▽ More
An application of the Soil Moisture Agricultural Drought Index (SMADI) at the global scale is presented. The index integrates surface soil moisture from the SMOS mission with land surface temperature (LST) and Normalized Difference Vegetation Index (NDVI) from MODIS and allows for global drought monitoring at medium spatial scales (0.05 deg).. Biweekly maps of SMADI were obtained from year 2010 to 2015 over all agricultural areas on Earth. The SMADI time-series were compared with state-of-the-art drought indices over the Iberian Peninsula. Results show a good agreement between SMADI and the Crop Moisture Index (CMI) retrieved at five weather stations (with correlation coefficient, R from -0.64 to -0.79) and the Soil Water Deficit Index (SWDI) at the Soil Moisture Measurement Stations Network of the University of Salamanca (REMEDHUS) (R=-0.83). Some preliminary tests were also made over the continental United States using the Vegetation Drought Response Index (VegDRI), with very encouraging results regarding the spatial occurrence of droughts during summer seasons. Additionally, SMADI allowed to identify distinctive patterns of regional drought over the Indian Peninsula in spring of 2012. Overall results support the use of SMADI for monitoring agricultural drought events world-wide.
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
Comparison of downscaling techniques for high resolution soil moisture mapping
Authors:
Sabah Sabaghy,
Jeffrey Walker,
Luigi Renzullo,
Ruzbeh Akbar,
Steven Chan,
Julian Chaubell,
Narendra Das,
R. Scott Dunbar,
Dara Entekhabi,
Anouk Gevaert,
Thomas Jackson,
Olivier Merlin,
Mahta Moghaddam,
Jinzheng Peng,
Jeffrey Piepmeier,
Maria Piles,
Gerard Portal,
Christoph Rudiger,
Vivien Stefan,
Xiaoling Wu,
Nan Ye,
Simon Yueh
Abstract:
Soil moisture impacts exchanges of water, energy and carbon fluxes between the land surface and the atmosphere. Passive microwave remote sensing at L-band can capture spatial and temporal patterns of soil moisture in the landscape. Both ESA and NASA have launched L-band radiometers, in the form of the SMOS and SMAP satellites respectively, to monitor soil moisture globally, every 3-day at about 40…
▽ More
Soil moisture impacts exchanges of water, energy and carbon fluxes between the land surface and the atmosphere. Passive microwave remote sensing at L-band can capture spatial and temporal patterns of soil moisture in the landscape. Both ESA and NASA have launched L-band radiometers, in the form of the SMOS and SMAP satellites respectively, to monitor soil moisture globally, every 3-day at about 40 km resolution. However, their coarse scale restricts the range of applications. While SMAP included an L-band radar to downscale the radiometer soil moisture to 9 km, the radar failed after 3 months and this initial approach is not applicable to developing a consistent long term soil moisture product across the two missions anymore. Existing optical-, radiometer-, and oversampling-based downscaling methods could be an alternative to the radar-based approach for delivering such data. Nevertheless, retrieval of a consistent high resolution soil moisture product remains a challenge, and there has been no comprehensive inter-comparison of the alternate approaches. This research undertakes an assessment of the different downscaling approaches using the SMAPEx-4 field campaign data
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
Remote sensing of vegetation dynamics in agro-ecosystems using SMAP vegetation optical depth and optical vegetation indices
Authors:
M. Piles,
D. Chaparro,
D. Entekhabi,
A. G. Konings,
T. Jagdhuber,
G. Camps-Valls
Abstract:
The ESA's SMOS and the NASA's SMAP missions, launched in 2009 and 2015, respectively, are the first two missions having on-board L-band microwave sensors, which are very sensitive to the water content in soils and vegetation. Focusing on the vegetation signal at L-band, we have implemented an inversion approach for SMAP that allows deriving vegetation optical depth (VOD, a microwave parameter rela…
▽ More
The ESA's SMOS and the NASA's SMAP missions, launched in 2009 and 2015, respectively, are the first two missions having on-board L-band microwave sensors, which are very sensitive to the water content in soils and vegetation. Focusing on the vegetation signal at L-band, we have implemented an inversion approach for SMAP that allows deriving vegetation optical depth (VOD, a microwave parameter related to biomass and plant water content) alongside soil moisture, without reliance on ancillary optical information on vegetation. This work aims at using this new observational data to monitor the phenology of crops in major global agro-ecosystems and enhance present agricultural monitoring and prediction capabilities. Core agricultural regions have been selected worldwide covering major crops (corn, soybean, wheat, rice). The complementarity and synergies between the microwave vegetation signal, sensitive to biomass water-uptake dynamics, and optical indices, sensitive to canopy greenness, are explored. Results reveal the value of L-band VOD as an independent ecological indicator for global terrestrial biosphere studies.
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
Explicit Granger causality in kernel Hilbert spaces
Authors:
Diego Bueso,
Maria Piles,
Gustau Camps-Valls
Abstract:
Granger causality (GC) is undoubtedly the most widely used method to infer cause-effect relations from observational time series. Several nonlinear alternatives to GC have been proposed based on kernel methods. We generalize kernel Granger causality by considering the variables cross-relations explicitly in Hilbert spaces. The framework is shown to generalize the linear and kernel GC methods, and…
▽ More
Granger causality (GC) is undoubtedly the most widely used method to infer cause-effect relations from observational time series. Several nonlinear alternatives to GC have been proposed based on kernel methods. We generalize kernel Granger causality by considering the variables cross-relations explicitly in Hilbert spaces. The framework is shown to generalize the linear and kernel GC methods, and comes with tighter bounds of performance based on Rademacher complexity. We successfully evaluate its performance in standard dynamical systems, as well as to identify the arrow of time in coupled Rössler systems, and is exploited to disclose the El Niño-Southern Oscillation (ENSO) phenomenon footprints on soil moisture globally.
△ Less
Submitted 29 November, 2020;
originally announced November 2020.
-
Learning drivers of climate-induced human migrations with Gaussian processes
Authors:
Jose M. Tarraga,
Maria Piles,
Gustau Camps-Valls
Abstract:
In the current context of climate change, extreme heatwaves, droughts, and floods are not only impacting the biosphere and atmosphere but the anthroposphere too. Human populations are forcibly displaced, which are now referred to as climate-induced migrants. In this work, we investigate which climate and structural factors forced major human displacements in the presence of floods and storms durin…
▽ More
In the current context of climate change, extreme heatwaves, droughts, and floods are not only impacting the biosphere and atmosphere but the anthroposphere too. Human populations are forcibly displaced, which are now referred to as climate-induced migrants. In this work, we investigate which climate and structural factors forced major human displacements in the presence of floods and storms during the years 2017-2019. We built, curated, and harmonized a database of meteorological and remote sensing indicators along with structural factors of 27 developing countries worldwide. We show how we can use Gaussian Processes to learn what variables can explain the impact of floods and storms in the context of forced displacements and to develop models that reproduce migration flows. Our results at regional, global, and disaster-specific scales show the importance of structural factors in the determination of the magnitude of displacements. The study may have both societal, political, and economical implications.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
Living in the Physics and Machine Learning Interplay for Earth Observation
Authors:
Gustau Camps-Valls,
Daniel H. Svendsen,
Jordi Cortés-Andrés,
Álvaro Moreno-Martínez,
Adrián Pérez-Suay,
Jose Adsuara,
Irene Martín,
Maria Piles,
Jordi Muñoz-Marí,
Luca Martino
Abstract:
Most problems in Earth sciences aim to do inferences about the system, where accurate predictions are just a tiny part of the whole problem. Inferences mean understanding variables relations, deriving models that are physically interpretable, that are simple parsimonious, and mathematically tractable. Machine learning models alone are excellent approximators, but very often do not respect the most…
▽ More
Most problems in Earth sciences aim to do inferences about the system, where accurate predictions are just a tiny part of the whole problem. Inferences mean understanding variables relations, deriving models that are physically interpretable, that are simple parsimonious, and mathematically tractable. Machine learning models alone are excellent approximators, but very often do not respect the most elementary laws of physics, like mass or energy conservation, so consistency and confidence are compromised. In this paper, we describe the main challenges ahead in the field, and introduce several ways to live in the Physics and machine learning interplay: to encode differential equations from data, constrain data-driven models with physics-priors and dependence constraints, improve parameterizations, emulate physical models, and blend data-driven and process-based models. This is a collective long-term AI agenda towards developing and applying algorithms capable of discovering knowledge in the Earth system.
△ Less
Submitted 18 October, 2020;
originally announced October 2020.
-
Gaussianizing the Earth: Multidimensional Information Measures for Earth Data Analysis
Authors:
J. Emmanuel Johnson,
Valero Laparra,
Maria Piles,
Gustau Camps-Valls
Abstract:
Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for p…
▽ More
Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for probability density estimation which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology allows us to estimate information-theoretic measures to characterize multivariate densities: information, entropy, total correlation, and mutual information. We demonstrate how information theory measures can be applied in various Earth system data analysis problems. First we show how the method can be used to jointly Gaussianize radar backscattering intensities, synthesize hyperspectral data, and quantify of information content in aerial optical images. We also quantify the information content of several variables describing the soil-vegetation status in agro-ecosystems, and investigate the temporal scales that maximize their shared information under extreme events such as droughts. Finally, we measure the relative information content of space and time dimensions in remote sensing products and model simulations involving long records of key variables such as precipitation, sensible heat and evaporation. Results confirm the validity of the method, for which we anticipate a wide use and adoption. Code and demos of the implemented algorithms and information-theory measures are provided.
△ Less
Submitted 25 November, 2020; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Nonlinear PCA for Spatio-Temporal Analysis of Earth Observation Data
Authors:
Diego Bueso,
Maria Piles,
Gustau Camps-Valls
Abstract:
Remote sensing observations, products and simulations are fundamental sources of information to monitor our planet and its climate variability. Uncovering the main modes of spatial and temporal variability in Earth data is essential to analyze and understand the underlying physical dynamics and processes driving the Earth System. Dimensionality reduction methods can work with spatio-temporal datas…
▽ More
Remote sensing observations, products and simulations are fundamental sources of information to monitor our planet and its climate variability. Uncovering the main modes of spatial and temporal variability in Earth data is essential to analyze and understand the underlying physical dynamics and processes driving the Earth System. Dimensionality reduction methods can work with spatio-temporal datasets and decompose the information efficiently. Principal Component Analysis (PCA), also known as Empirical Orthogonal Functions (EOF) in geophysics, has been traditionally used to analyze climatic data. However, when nonlinear feature relations are present, PCA/EOF fails. In this work, we propose a nonlinear PCA method to deal with spatio-temporal Earth System data. The proposed method, called Rotated Complex Kernel PCA (ROCK-PCA for short), works in reproducing kernel Hilbert spaces to account for nonlinear processes, operates in the complex kernel domain to account for both space and time features, and adds an extra rotation for improved flexibility. The result is an explicitly resolved spatio-temporal decomposition of the Earth data cube. The method is unsupervised and computationally very efficient.We illustrate its ability to uncover spatio-temporal patterns using synthetic experiments and real data. Results of the decomposition of three essential climate variables are shown: satellite-based global Gross Primary Productivity (GPP) and Soil Moisture (SM), and reanalysis Sea Surface Temperature (SST) data. The ROCK-PCA method allows identifying their annual and seasonal oscillations, as well as their non-seasonal trends and spatial variability patterns.
△ Less
Submitted 27 January, 2020;
originally announced February 2020.